Cyber Center

Research - Publications

  • 07/07/2014Cross-Domain and Cross-Category Emotion Tagging for Comments of Online NewsYing Zhang, Ning Zhang, Luo Si, Yanshan Lu, Qifan Wang, Xiaojie Yuan

    In many online news services, users often write comments towards news in subjective emotions such as sadness, happiness or anger. Knowing such emotions can help understand the preferences and perspectives of individual users, and therefore may facilitate online publishers to provide more relevant services to users. Although building emotion classifiers is a practical task, it highly depends on sufficient training data that is not easy to be collected directly and the manually labeling work of comments can be quite labor intensive. Also, online news has different domains, which makes the problem even harder as different word distributions of the domains require different classifiers with corresponding distinct training data.

    This paper addresses the task of emotion tagging for comments of cross-domain online news. The cross-domain task is formulated as a transfer learning problem which utilizes a small amount of labeled data from a target news domain and abundant labeled data from a different source domain. This paper proposes a novel framework to transfer knowledge across different news domains. More specifically, different approaches have been proposed when the two domains share the same set of emotion categories or use different categories. An extensive set of experimental results on four datasets from popular online news services demonstrates the effectiveness of our proposed models in cross-domain emotion tagging for comments of online news in both the scenarios of sharing the same emotion categories or having different categories in the source and target domains.

    View the Full Publication
  • 07/03/2014Active Hashing with Joint Data Example and Tag SelectionQifan Wang, Luo Si

    Similarity search is an important problem in many large scale applications such as image and text retrieval. Hashing method has become popular for similarity search due to its fast search speed and low storage cost. Recent research has shown that hashing quality can be dramatically improved by incorporating supervised information, e.g. semantic tags/labels, into hashing function learning. However, most existing supervised hashing methods can be regarded as passive methods, which assume that the labeled data are provided in advance. But in many real world applications, such supervised information may not be available.

    This paper proposes a novel active hashing approach, Active Hashing with Joint Data Example and Tag Selection (AH-JDETS), which actively selects the most informative data examples and tags in a joint manner for hashing function learning. In particular, it first identifies a set of informative data examples and tags for users to label based on the selection criteria that both the data examples and tags should be most uncertain and dissimilar with each other. Then this labeled information is combined with the unlabeled data to generate an effective hashing function.

    An iterative procedure is proposed for learning the optimal hashing function and selecting the most informative data examples and tags. Extensive experiments on four different datasets demonstrate that AH-JDETS achieves good performance compared with state-of-the-art supervised hashing methods but requires much less labeling cost, which overcomes the limitation of passive hashing methods. Furthermore, experimental results also indicate that the joint active selection approach outperforms a random (non-active) selection method and active selection methods only focusing on either data examples or tags.

    View the Full Publication
  • 07/03/2014Preference Preserving Hashing for Efficient RecommendationZhiwei Zhang, Qifan Wang, Lingyun Ruan, Luo Si

    Recommender systems usually need to compare a large number of items before users' most preferred ones can be found This process can be very costly if recommendations are frequently made on large scale datasets. In this paper, a novel hashing algorithm, named Preference Preserving Hashing (PPH), is proposed to speed up recommendation. Hashing has been widely utilized in large scale similarity search (e.g. similar image search), and the search speed with binary hashing code is significantly faster than that with real-valued features. However, one challenge of applying hashing to recommendation is that, recommendation concerns users' preferences over items rather than their similarities. To address this challenge, PPH contains two novel components that work with the popular matrix factorization (MF) algorithm. In MF, users' preferences over items are calculated as the inner product between the learned real-valued user/item features. The first component of PPH constrains the learning process, so that users' preferences can be well approximated by user-item similarities. The second component, which is a novel quantization algorithm,generates the binary hashing code from the learned real-valued user/item features. Finally, recommendation can be achieved efficiently via fast hashing code search. Experiments on three real world datasets show that the recommendation speed of the proposed PPH algorithm can be hundreds of times faster than original MF with real-valued features, and the recommendation accuracy is significantly better than previous work of hashing for recommendation.

    View the Full Publication
  • 06/05/2014Draft Genome Sequence of Acetobacter aceti Strain 1023, a Vinegar Factory IsolateJohn Hung, Christopher Mill, Sandra Clifton, Vincent Magrini, Ketaki Bhide, Julie Francois, Aaron Ransome, Lucinda Fulton, Jyothi Thimmapuram, Richard Wilson, T. Joseph Kappock

    The genome sequence of Acetobacter aceti 1023, an acetic acid bacterium adapted to traditional vinegar fermentation, comprises 3.0 Mb (chromosome plus plasmids). A. aceti 1023 is closely related to the cocoa fermenter Acetobacter pasteurianus 386B but possesses many additional insertion sequence elements.

    View the Full Publication
  • 06/01/2014Privacy of Outsourced k-mean ClusteringDongxi Liu, Elisa Bertino, Xun Yi

    It is attractive for an organization to outsource its data analytics to a service provider who has powerful platforms and advanced an- alytics skills. However, the organization (data owner) may have concerns about the privacy of its data. In this paper, we present a method that allows the data owner to encrypt its data with a homo- morphic encryption scheme and the service provider to perform k- means clustering directly over the encrypted data. However, since the ciphertexts resulting from homomorphic encryption do not pre- serve the order of distances between data objects and cluster cen- ters, we propose an approach that enables the service provider to compare encrypted distances with the trapdoor information pro- vided by the data owner. The efficiency of our method is validated by extensive experimental evaluation.

    View the Full Publication
  • 05/01/2014A practical approach for provenance transmission in wireless sensor networksS.M.I. Alam, Sonia Fahmy

    Assessing the trustworthiness of sensor data and transmitters of this data is critical for quality assurance. Trust evaluation frameworks utilize data provenance along with the sensed data values to compute the trustworthiness of each data item. However, in a sizeable multi-hop sensor network, provenance information requires a large and variable number of bits in each packet, resulting in high energy dissipation due to the extended period of radio communication. In this paper, we design energy-efficient provenance encoding and construction schemes, which we refer to as Probabilistic Provenance Flow (PPF). Our work demonstrates the feasibility of adapting the Probabilistic Packet Marking (PPM) technique in IP traceback to wireless sensor networks. We design two bit-efficient provenance encoding schemes along with a complementary vanilla scheme. Depending on the network size and bit budget, we select the best method based on mathematical approximations and numerical analysis. We integrate PPF with provenance-based trust frameworks and investigate the trade-off between trustworthiness of data items and transmission overhead. We conduct TOSSIM simulations with realistic wireless links, and perform testbed experiments on 15–20 TelosB motes to demonstrate the effectiveness of PPF. Our results show that the encoding schemes of PPF have identical performance with a low bit budget (∼32-bit), requiring 33% fewer packets and 30% less energy than PPM variants to construct provenance. With a twofold increase in bit budget, PPF with the selected encoding scheme reduces energy consumption by 46–60%

    View the Full Publication
  • 05/01/2014A practical approach for provenance transmission in wireless sensor networksS.M. Iftekharul Alam, Sonia Fahmy

    Assessing the trustworthiness of sensor data and transmitters of this data is critical for quality assurance. Trust evaluation frameworks utilize data provenance along with the sensed data values to compute the trustworthiness of each data item. However, in a sizeable multi-hop sensor network, provenance information requires a large and variable number of bits in each packet, resulting in high energy dissipation due to the extended period of radio communication. In this paper, we design energy-efficient provenance encoding and construction schemes, which we refer to as Probabilistic Provenance Flow (PPF). Our work demonstrates the feasibility of adapting the Probabilistic Packet Marking (PPM) technique in IP traceback to wireless sensor networks. We design two bit-efficient provenance encoding schemes along with a complementary vanilla scheme. Depending on the network size and bit budget, we select the best method based on mathematical approximations and numerical analysis. We integrate PPF with provenance-based trust frameworks and investigate the trade-off between trustworthiness of data items and transmission overhead. We conduct TOSSIM simulations with realistic wireless links, and perform testbed experiments on 15–20 TelosB motes to demonstrate the effectiveness of PPF. Our results show that the encoding schemes of PPF have identical performance with a low bit budget (∼32-bit), requiring 33% fewer packets and 30% less energy than PPM variants to construct provenance. With a twofold increase in bit budget, PPF with the selected encoding scheme reduces energy consumption by 46–60%.

    View the Full Publication
  • 04/01/2014Practical k Nearest Neighbor Queries with Location PrivacyXun Yi, Russell Paulet, Elisa Bertino, Vijay Varadharajan

    In mobile communication, spatial queries pose a serious threat to user location privacy because the location of a query may reveal sensitive information about the mobile user. In this paper, we study k nearest neighbor (kNN) queries where the mobile user queries the location-based service (LBS) provider about k nearest points of interest (POIs) on the basis of his current location. We propose a solution for the mobile user to preserve his location privacy in kNN queries. The proposed solution is built on the Paillier public-key cryptosystem and can provide both location privacy and data privacy. In particular, our solution allows the mobile user to retrieve one type of POIs, for example, k nearest car parks, without revealing to the LBS provider what type of points is retrieved. For a cloaking region with n×n cells and m types of points, the total communication complexity for the mobile user to retrieve a type of k nearest POIs is O(n+m) while the computation complexities of the mobile user and the LBS provider are O(n + m) and O(n2m), respectively. Compared with existing solutions for kNN queries with location privacy, our solutions are more efficient. Experiments have shown that our solutions are practical for kNN queries.

    View the Full Publication
  • 03/01/2014Representation and querying of unfair evaluations in social rating systemsMohammad Allahbakhsh, Aleksandar Ignjatovic, Boualem Benatallah, Seyed-Mehdi-Reza Beheshti, Norman Foo, Elisa Bertino

    Social rating systems are subject to unfair evaluations. Users may try to individually or collaboratively promote or demote a product. Detecting unfair evaluations, mainly massive collusive attacks as well as honest looking intelligent attacks, is still a real challenge for collusion detection systems. In this paper, we study the impact of unfair evaluations in online rating systems. First, we study the individual unfair evaluations and their impact on the reputation of people calculated by social rating systems. We then propose a method for detecting collaborative unfair evaluations, also known as collusion. The proposed model uses frequent itemset mining technique to detect the candidate collusion groups and sub-groups. We use several indicators to identify collusion groups and to estimate how destructive such colluding groups can be. The approaches presented in this paper have been implemented in prototype tools, and experimentally validated on synthetic and real-world datasets.

    View the Full Publication
  • 03/01/2014A formal proximity model for RBAC systemsAditi Gupta, Michael Kirkpatrick, Elisa Bertino

    To combat the threat of information leakage through pervasive access, researchers have proposed several extensions to the popular role-based access control (RBAC) model. Such extensions can incorporate contextual features, such as location, into the policy decision in an attempt to restrict access to trustworthy settings. In many cases, though, such extensions fail to reflect the true threat, which is the presence or absence of other users, rather than absolute locations. For instance, for location-aware separation of duty, it is more important to ensure that two people are in the same room, rather than in a designated, pre-defined location. Prox-RBAC was proposed as an extension to consider the relative proximity of other users with the help of a pervasive monitoring infrastructure. However, that work offered only an informal view of proximity, and unnecessarily restricted the domain to spatial concerns. In this work, we present a more rigorous definition of proximity based on formal topological relations. In addition, we show that this definition can be applied to several additional domains, such as social networks, communication channels, attributes, and time; thus, our policy model and language is more flexible and powerful than the previous work. In addition to proposing the model, we present a number of theoretical results for such systems, including a complexity analysis, templates for cryptographic protocols, and proofs of security features.

    View the Full Publication
  • 01/01/2014IEEE TDSC EditorialElisa Bertino

    IEEE TDSC under the leadership of its founding EIC Professor Ravi Iyer, and then under the leadership of two very distinguished EICs, Professor Virgil Gligor and Professor Ravi Sandhu, has established itself as a prestigious publication venue for research in the dependability and security disciplines. During the last three years, under the leadership of Professor Ravi Sandhu, IEEE TDSC has been very successful in increasing the number and quality of published papers. This is reflected in an impact factor for IEEE TDSC of 1.059 (per the 2012 Thomson Reuters Journal Citation Report, released in June 2013) which sets IEEE TDSC above several other journals in the area of computer and information security. My plan is to continue on this path of excellence and work towards increasing the impact and visibility of IEEE TDSC. I would like to take this opportunity to acknowledge the tremendous work done by the previous EICs which has resulted in IEEE TDSC becoming a premier venue for security research publication. IEEE TDSC was founded with the overarching vision that even though the fields of Dependability and Security have originated and have grown separately, there are many synergies between these two disciplines. In the end, the safe and effective use of computer systems requires the use and integration of techniques and methods from both these disciplines. I will continue to foster, as the previous EICs, these synergies. A specific action that I will take will be the organization of a special issue devoted to multidisciplinary research across these two disciplines. I also believe that dependability and security are highly dependent on specific application domains. I thus encourage the submissions of papers that identify and address security requirements arising from novel application domains, such as location-based systems, and pervasive systems, just to name a few. Privacy research will also be an editorial focus as I believe that devising approaches that are secure and dependable and at the same time preserve privacy is a crucial challenge. Actions that I will take include the invitation of papers focusing on novel research perspectives and surveys as well as special issues focusing on specific application domains. Besides traditional research papers, my plan is also to foster the publication of experimental papers. Prominent database conferences, such as the VLDB Conference, have introduced some years ago a new track for experiments and analyses papers to facilitate in-depth (analytical or empirical) studies and comparisons of existing techniques. In the past, such papers were likely to be rejected for the lack of novel contributions. I believe that it is healthy and necessary for our community to consolidate a mature/maturing research area, and papers that systematically evaluate techniques that shed further insights on their effectiveness are important. I would like to see IEEE TDSC providing an avenue for experimental research in dependability and security. Examples of such experimental research include benchmarks of cryptographic protocol implementations and experimental assessment of defense techniques by benchmarks and testbeds. I would like to conclude by emphasizing the importance of all the individuals contributing to IEEE TDSC: the associate EIC, the editorial board members, the reviewers, the authors, the IEEE Computer Society staff, and the production crews. Their active contributions and dedication are critical for IEEE TDSC to continue on a success path. I look forward to working with you all.

    View the Full Publication
  • 01/01/2014Lightweight and Secure Two-Party Range Queries over Outsourced Encrypted DatabasesBharath Samanthula, Wei Jiang, Elisa Bertino

    With the many benefits of cloud computing, an entity may want to outsource its data and their related analytics tasks to a cloud. When data are sensitive, it is in the interest of the entity to outsource encrypted data to the cloud; however, this limits the types of operations that can be performed on the cloud side. Especially, evaluating queries over the encrypted data stored on the cloud without the entity performing any computation and without ever decrypting the data become a very challenging problem. In this paper, we propose solutions to conduct range queries over outsourced encrypted data. The existing methods leak valuable information to the cloud which can violate the security guarantee of the underlying encryption schemes. In general, the main security primitive used to evaluate range queries is secure comparison (SC) of encrypted integers. However, we observe that the existing SC protocols are not very efficient. To this end, we first propose a novel SC scheme that takes encrypted integers and outputs encrypted comparison result. We empirically show its practical advantage over the current state-of-the-art. We then utilize the proposed SC scheme to construct two new secure range query protocols. Our protocols protect data confidentiality, privacy of user's query, and also preserve the semantic security of the encrypted data; therefore, they are more secure than the existing protocols. Furthermore, our second protocol is lightweight at the user end, and it can allow an authorized user to use any device with limited storage and computing capability to perform the range queries over outsourced encrypted data.

    View the Full Publication
  • 01/01/2014Detecting mobile malware threats to homeland security through static analysisSeung-Hyun Seo, Aditi Gupta, Asmaa Mohamed Sallama, Kangbin Yimb

    Recent years have seen the significant increase in the popularity of smartphones. This popularity has been accompanied with an equally alarming rise in mobile malware. Recently released mobile malware targeting Android devices have been found to specifically focus on root exploits to obtain root-level access and execute instructions from a remote server. Thus, this kind of mobile malware presents a significant threat to Homeland Security. This is possible because smartphones can serve as zombie devices which are then controlled by hackers’ via a C&C server. In this paper, we discuss the defining characteristics inherent in mobile malware and show mobile attack scenarios which are feasible against Homeland Security. We also propose a static analysis tool, DroidAnalyzer, which identifies potential vulnerabilities of Android apps and the presence of root exploits. Then, we analyze various mobile malware samples and targeting apps such as banking, flight tracking and booking, home&office monitoring apps to examine potential vulnerabilities by applying DroidAnalyzer.

    View the Full Publication
  • 01/01/2014Flow-based partitioning of network testbed experimentsWei-Min Yao, Sonia Fahmy

    Understanding the behavior of large-scale systems is challenging, but essential when designing new Internet protocols and applications. It is often infeasible or undesirable to conduct experiments directly on the Internet. Thus, simulation, emulation, and testbed experiments are important techniques for researchers to investigate large-scale systems.

    In this paper, we propose a platform-independent mechanism to partition a large network experiment into a set of small experiments that are sequentially executed. Each of the small experiments can be conducted on a given number of experimental nodes, e.g., the available machines on a testbed. Results from the small experiments approximate the results that would have been obtained from the original large experiment. We model the original experiment using a flow dependency graph. We partition this graph, after pruning uncongested links, to obtain a set of small experiments. We execute the small experiments iteratively. Starting with the second iteration, we model dependent partitions using information gathered about both the traffic and the network conditions during the previous iteration. Experimental results from several simulation and testbed experiments demonstrate that our techniques approximate performance characteristics, even with closed-loop traffic and congested links. We expose the fundamental tradeoff between the simplicity of the partitioning and experimentation process, and the loss of experimental fidelity.

    View the Full Publication
  • 01/01/2014IdentiDroid: Android can finally Wear its Anonymous SuitBilal Shebaro, Oluwatosin Ogunwuyi, Daniele Midi, Elisa Bertino

    Because privacy today is a major concern for mobile applications, network anonymizers are widely available on smartphones, such as Android. However despite the use of such anonymizers, in many cases applications are still able to identify the user and the device by different means than the IP address. The reason is that very often applications require device services and information that go beyond the capabilities of anonymous networks in protecting users’ identity and privacy. In this paper, we propose two solutions that address this problem. The first solution is based on an approach that shadows user and application data, device information, and resources that can reveal the user identity. Data shadowing is executed when the smartphone switches to the “anonymous modality”. Once the smartphone returns to work in the normal (i.e. non-anonymous) modality, application data, device information and resources are returned back to the state they had before the anonymous connection. The second solution is based on run-time modifications of Android application permissions. Permissions associated with sensitive information are dynamically revoked at run-time from applications when the smartphone is used under the anonymous modality. They are re-instated back when the smartphone returns to work in the normal modality. In addition, both solutions offer protection from applications that identify their users through traces left in the application’s data storage or through exchanging identifying data messages. We developed IdentiDroid, a customized Android operating system, to deploy these solutions and built IdentiDroid Profile Manager, a profile-based configuration tool that allows one to set different configurations for each installed Android application. With this tool, applications running within the same device are configured to be given different identifications and privileges to limit the uniqueness of device and user information. We analyzed 250 Android applications to determine what information, services, and permissions can identify users and devices. Our experiments show that when IdentiDroid is deployed and properly configured on Android devices, users’ anonymity is better guaranteed by either of the proposed solutions with no significant impact on most device applications.

    View the Full Publication
  • 01/01/2014MED18 interaction with distinct transcription factors regulates multiple plant functions.Z Lai, C Schluttenhofer, Ketaki Bhide, J Schreve, Jyothi Thimmapuram, S Y. Lee, D J. Yum, T Mengiste

    Mediator is an evolutionarily conserved transcriptional regulatory complex. Mechanisms of Mediator function are poorly understood. Here we show that Arabidopsis MED18 is a multifunctional protein regulating plant immunity, flowering time and responses to hormones through interactions with distinct transcription factors. MED18 interacts with YIN YANG1 to suppress disease susceptibility genes glutaredoxins GRX480, GRXS13 and thioredoxin TRX-h5. Consequently, yy1 and med18 mutants exhibit deregulated expression of these genes and enhanced susceptibility to fungal infection. In addition, MED18 interacts with ABA INSENSITIVE 4 and SUPPRESSOR OF FRIGIDA4 to regulate abscisic acid responses and flowering time, respectively. MED18 associates with the promoter, coding and terminator regions of target genes suggesting its function in transcription initiation, elongation and termination. Notably, RNA polymerase II occupancy and histone H3 lysine tri-methylation of target genes are affected in the med18 mutant, reinforcing MED18 function in different mechanisms of transcriptional control. Overall, MED18 conveys distinct cues to engender transcription underpinning plant responses.

    View the Full Publication
  • 12/01/2013The Cetus Source-to-Source Compiler Infrastructure: Overview and EvaluationHansang Bae, Dheya Mustafa, Jaewoo Lee, Aurangzeb, Hao Lin, Chirag Dave, Rudolf Eigenmann, Samuel Midkiff

    This paper provides an overview and an evaluation of the Cetus source-to-source compiler infrastructure. The original goal of the Cetus project was to create an easy-to-use compiler for research in automatic parallelization of C programs. In meantime, Cetus has been used for many additional program transformation tasks. It serves as a compiler infrastructure for many projects in the US and internationally. Recently, Cetus has been supported by the National Science Foundation to build a community resource. The compiler has gone through several iterations of benchmark studies and implementations of those techniques that could improve the parallel performance of these programs. These efforts have resulted in a system that favorably compares with state-of-the-art parallelizers, such as Intel’s ICC. A key limitation of advanced optimizing compilers is their lack of runtime information, such as the program input data. We will discuss and evaluate several techniques that support dynamic optimization decisions. Finally, as there is an extensive body of proposed compiler analyses and transformations for parallelization, the question of the importance of the techniques arises. This paper evaluates the impact of the individual Cetus techniques on overall program performance.

    View the Full Publication
  • 12/01/2013Compiler InfrastructureRudolf Eigenmann, Samuel Midkiff

    This special issue presents a number of papers that discuss the design of compiler infrastructures and their use in research and education projects. Early work described in these papers was presented at the “Cetus Users and Compiler Infrastructure Workshop,” held in conjunction with the International Conference on Parallel Architectures and Compilation Techniques, PACT’2011, in Galveston Texas, on October 10, 2011. The focus on the Cetus compiler is evident in the present papers as well. An attempt was made, however, to cover other infrastructure platforms as well. To this end, papers in the special issue also discuss the LLVM, OpenUH, and Rose platforms. These special issue papers have all been reviewed according to the journal’s high-quality standards for publication. The availability of advanced compiler infrastructures is of critical importance. It enables the implementation of a new program analysis, optimization, or transformation pass in the context of a complete translation platform that can handle realistic applications and thus evaluate the new pass quantitatively. As different research projects have different needs, more than a single compiler infrastructure is needed. It is not the goal of the special issue to advertise one best infrastructure. Rather, the reader can learn about features of different platforms and identify the one that is most useful for their research plan. The included papers pursue this aim as follows: The first article, by Bae et al., provides an overview of the Cetus compiler infrastructure. The second paper, by Yi Yang and Huiyang Zou, describes an optimizing compiler for GPGPU architectures that is built on the Cetus infrastructure. Next, Gabriel Rodríguez et al., compare two infrastructures, LLVM and Cetus, by using them in a project to create a compiler that provides fault tolerance for message-passing applications. Amin Sarvestani et al. have created a tool for idiom recognition in digital signal processing applications. They describe the use of the Cetus infrastructure for that purpose. Barbara Chapman et al. present the OpenHP compiler infrastructure and its use in research and education projects. Finally, Xipeng Shen et al. describe the use of the Cetus and Rose infrastructures in the creation of an auto-tuning system that adapts compiler optimizations for GPUs based on characteristics of the program input.

    View the Full Publication
  • 12/01/2013RFID Security and privacyYingjiu Li, Robert H. Deng, Elisa Bertino

    As a fast-evolving new area, RFID security and privacy has quickly grown from a hungry infant to an energetic teenager during recent years. Much of the exciting development in this area is summarized in this book with rigorous analyses and insightful comments. In particular, a systematic overview on RFID security and privacy is provided at both the physical and network level. At the physical level, RFID security means that RFID devices should be identified with assurance in the presence of attacks, while RFID privacy requires that RFID devices should be identified without disclosure of any valuable information about the devices. At the network level, RFID security means that RFID information should be shared with authorized parties only, while RFID privacy further requires that RFID information should be shared without disclosure of valuable RFID information to any honest-but-curious server which coordinates information sharing. Not only does this book summarize the past, but it also provides new research results, especially at the network level. Several future directions are envisioned to be promising for advancing the research in this area.

    View the Full Publication
  • 11/01/2013Privacy preserving policy based content sharing in public cloudsMohamed Nabeel, Ning Shang, Elisa Bertino

    An important problem in public clouds is how to selectively share documents based on fine-grained attribute-based access control policies (acps). An approach is to encrypt documents satisfying different policies with different keys using a public key cryptosystem such as attribute-based encryption, and/or proxy re-encryption. However, such an approach has some weaknesses: it cannot efficiently handle adding/revoking users or identity attributes, and policy changes; it requires to keep multiple encrypted copies of the same documents; it incurs high computational costs. A direct application of a symmetric key cryptosystem, where users are grouped based on the policies they satisfy and unique keys are assigned to each group, also has similar weaknesses. We observe that, without utilizing public key cryptography and by allowing users to dynamically derive the symmetric keys at the time of decryption, one can address the above weaknesses. Based on this idea, we formalize a new key management scheme, called broadcast group key management (BGKM), and then give a secure construction of a BGKM scheme called ACV-BGKM. The idea is to give some secrets to users based on the identity attributes they have and later allow them to derive actual symmetric keys based on their secrets and some public information. A key advantage of the BGKM scheme is that adding users/revoking users or updating acps can be performed efficiently by updating only some public information. Using our BGKM construct, we propose an efficient approach for fine-grained encryption-based access control for documents stored in an untrusted cloud file storage.

    View the Full Publication
  • 10/01/2013Encryption Key Management for Secure Communication in Smart Advanced Metering InfrastructuresSeung-Hyun Seo, Xiaoyu Ding, Elisa Bertino

    Smart grid technology can improve environmental sustainability and increase the efficiency of energy management. Because of these important benefits, conventional power grid systems are being replaced with new, advanced smart grid systems utilizing Advanced Metering Infrastructures (AMIs). These smart grid systems rely on current information and communication technology (ICT) to provide enhanced services to both users and utility companies. However, the increased use of ICT makes smart grid systems vulnerable to cyber-attacks, such as spoofing, eavesdropping and man-in-the-middle attacks. A major security concern is related to secure data transmission between the smart meters and the utility. Encryption techniques are typically used for such purpose. However the deployment of encryption techniques in an AMI requires efficient and scalable approaches for managing encryption keys. In this paper, we propose an efficient encryption key management mechanism for end-to-end security in the AMI. By applying certificateless public key cryptography for smart meter key management, our approach eliminates certificate management overhead at the utility. Moreover, our mechanism is practical, because it does not require any extra hardware for authentication of the smart meters.

    View the Full Publication
  • 09/01/2013Introduction to the special issue on data qualityMourad Ouzzani, Paolo Papotti, Erhard Rahm

    Poor data quality in databases, data warehouses, and information systems affects every application domain. Many data processing tasks, such as information integration, data sharing, information retrieval, information extraction, and knowledge discovery require various forms of data preparation and consolidation with complex data processing techniques. These tasks usually assume that the data input contains no missing, inconsistent or incorrect values. This leaves a large gap between the available “dirty” data and the machinery to effectively process the data for the application purposes. In addition, tasks such as data integration and information extraction may themselves introduce errors in the data.

    View the Full Publication
  • 08/14/2013CRIS — Computational research infrastructure for scienceEduard Dragut, Peter Baker, Jai Xu, Muhammed Sarfraz, Elisa Bertino, Raghu Agarwal, Aamer Mahmood, Sangchun Han

    The challenges facing the scientific community are common and real: conduct relevant and verifiable research in a rapidly changing collaborative landscape with an ever increasing scale of data. It has come to a point where research activities cannot scale at the rate required without improved cyberinfrastructure (CI). In this paper we describe CRIS (The Computational Research Infrastructure for Science), with its primary tenets to provide an easy to use, scalable, and collaborative scientific data management and workflow cyberinfrastructure for scientists lacking extensive computational expertise. Some of the key features of CRIS are: 1) semantic definition of scientific data using domain vocabularies; 2) embedded provenance for all levels of research activity (data, workflows, tools etc.); 3) easy integration of existing heterogeneous data and computational tools on local or remote computers; 4) automatic data quality monitoring for syntactic and domain standards; and 5) shareable yet secure access to research data, computational tools and equipment. CRIS currently has a community of users in Agronomy, Biochemistry, Bioinformatics and Healthcare Engineering at Purdue University (cris.cyber.purdue.edu).

    View the Full Publication
  • 07/01/2013An Experimental Framework for BGP Securuty EvaluationDebbie Perouli, Olaf Maennel, Iain Phillips, Sonia Fahmy, Randy Bush, Rob Austein

    Internet routing is based on implicit trust assumptions. Given the critical importance of the Internet and the increasing security threats, such simple trust relationships are no longer sufficient. Over the past decade, significant research has been devoted to securing the Internet routing system. The Internet Engineering Task Force (IETF) is well along in the process of standardizing routing security enhancements (Secure Inter-Domain Routing — SIDR, Keying and Authentication for Routing Protocols — KARP, etc.). However, the research challenges are not over: not only do these new protocols need to be tested for protocol conformance and interoperability, they also need to be evaluated both for their security properties and scaling performance. The purpose of this paper is two-fold: we outline the main security challenges in inter-domain routing and argue that research in this area has barely begun; and we take a closer look at a production implementation of one component and evaluate it at a fairly large scale. We discuss the difficulties we experienced and lessons learned; we also present some initial results.

    View the Full Publication
  • 06/01/2013Forcasting user visits for online display advertisingSuleyman Cetintas, Datong Chen, Luo Si

    Online display advertising is a multi-billion dollar industry where advertisers promote their products to users by having publishers display their advertisements on popular Web pages. An important problem in online advertising is how to forecast the number of user visits for a Web page during a particular period of time. Prior research addressed the problem by using traditional time-series forecasting techniques on historical data of user visits; (e.g., via a single regression model built for forecasting based on historical data for all Web pages) and did not fully explore the fact that different types of Web pages and different time stamps have different patterns of user visits. In this paper, we propose a series of probabilistic latent class models to automatically learn the underlying user visit patterns among multiple Web pages and multiple time stamps. The last (and the most effective) proposed model identifies latent groups/classes of (i) Web pages and (ii) time stamps with similar user visit patterns, and learns a specialized forecast model for each latent Web page and time stamp class. Compared with a single regression model as well as several other baselines, the proposed latent class model approach has the capability of differentiating the importance of different types of information across different classes of Web pages and time stamps, and therefore has much better modeling flexibility. An extensive set of experiments along with detailed analysis carried out on real-world data from Yahoo! demonstrates the advantage of the proposed latent class models in forecasting online user visits in online display advertising.

    View the Full Publication
  • 06/01/2013Similarity queries: their conceptual evaluation, transformation and processingYasin Silva, Walid G. Aref, Per-Ake Larson, Spencer Pearson, Mohamed Ali

    Many application scenarios can significantly benefit from the identification and processing of similarities in the data. Even though some work has been done to extend the semantics of some operators, for example join and selection, to be aware of data similarities, there has not been much study on the role and implementation of similarity-aware operations as first-class database operators. Furthermore, very little work has addressed the problem of evaluating and optimizing queries that combine several similarity operations. The focus of this paper is the study of similarity queries that contain one or multiple first-class similarity database operators such as Similarity Selection, Similarity Join, and Similarity Group-by. Particularly, we analyze the implementation techniques of several similarity operators, introduce a consistent and comprehensive conceptual evaluation model for similarity queries, and present a rich set of transformation rules to extend cost-based query optimization to the case of similarity queries.

    View the Full Publication
  • 05/01/2013Multi-route query processing and optimizationRimma Nehme, Karen Works, Chuan Leib, Elke Rundensteiner, Elisa Bertino

    A modern query optimizer typically picks a single query plan for all data based on overall data statistics. However, many have observed that real-life datasets tend to have non-uniform distributions. Selecting a single query plan may result in ineffective query execution for possibly large portions of the actual data. In addition most stream query processing systems, given the volume of data, cannot precisely model the system state much less account for uncertainty due to continuous variations. Such systems select a single query plan based upon imprecise statistics. In this paper, we present “Query Mesh” (or QM), a practical alternative to state-of-the-art data stream processing approaches. The main idea of QM is to compute multiple routes (i.e., query plans), each designed for a particular subset of the data with distinct statistical properties. We use terms “plans” and “routes” interchangeably in our work. A classifier model is induced and used to assign the best route to process incoming tuples based upon their data characteristics. We formulate the QM search space and analyze its complexity. Due to the substantial search space, we propose several cost-based query optimization heuristics designed to effectively find nearly optimal QMs. We propose the Self-Routing Fabric (SRF) infrastructure that supports query execution with multiple plans without physically constructing their topologies nor using a central router like Eddy. We also consider how to support uncertain route specification and execution in QM which can occur when imprecise statistics lead to more than one optimal route for a subset of data. Our experimental results indicate that QM consistently provides better query execution performance and incurs negligible overhead compared to the alternative state-of-the-art data stream approaches.

    View the Full Publication
  • 04/01/2013Pegasus: Precision hunting for icebergs and anomalies in network flowsSriharsha Gangam, P Sharma, Sonia Fahmy

    Accurate online network monitoring is crucial for detecting attacks, faults, and anomalies, and determining traffic properties across the network. With high bandwidth links and consequently increasing traffic volumes, it is difficult to collect and analyze detailed flow records in an online manner. Traditional solutions that decouple data collection from analysis resort to sampling and sketching to handle large monitoring traffic volumes. We propose a new system, Pegasus, to leverage commercially available co-located compute and storage devices near routers and switches. Pegasus adaptively manages data transfers between monitors and aggregators based on traffic patterns and user queries. We use Pegasus to detect global icebergs or global heavy-hitters. Icebergs are flows with a common property that contribute a significant fraction of network traffic. For example, DDoS attack detection is an iceberg detection problem with a common destination IP. Other applications include identification of “top talkers,” top destinations, and detection of worms and port scans. Experiments with Abilene traces, sFlow traces from an enterprise network, and deployment of Pegasus as a live monitoring service on PlanetLab show that our system is accurate and scales well with increasing traffic and number of monitors.

    View the Full Publication
  • 03/01/2013Active Learning with Optimal Instance Subset SelectionYifan Fu, Xingquan Zhu, Ahmed Elmagarmid

    Active learning (AL) traditionally relies on some instance-based utility measures (such as uncertainty) to assess individual instances and label the ones with the maximum values for training. In this paper, we argue that such approaches cannot produce good labeling subsets mainly because instances are evaluated independently without considering their interactions, and individuals with maximal ability do not necessarily form an optimal instance subset for learning. Alternatively, we propose to achieve AL with optimal subset selection (ALOSS), where the key is to find an instance subset with a maximum utility value. To achieve the goal, ALOSS simultaneously considers the following: 1) the importance of individual instances and 2) the disparity between instances, to build an instance-correlation matrix. As a result, AL is transformed to a semidefinite programming problem to select a k-instance subset with a maximum utility value. Experimental results demonstrate that ALOSS outperforms state-of-the-art approaches for AL.

    View the Full Publication
  • 02/07/2013Collaboration in Multicloud computing environments: Framework and security issuesS Chandrasekhar, M Singhl, GE Tingjian, R Krishnan, Gail Joon Ahn, Elisa Bertino

    A proposed proxy-based multicloud computing framework allows dynamic, on-the-fly collaborations and resource sharing among cloud-based services, addressing trust, policy, and privacy issues without preestablished collaboration agreements or standardized interfaces.

    View the Full Publication
  • 02/01/2013Collaboration in multicloud computing environments: Framework and security issuesM Singhal, S Chandrasekhar, Ge Tingjian, R. Sandhu, R Krishnan, Ahn Gail-Joon, Elisa Bertino

    A proposed proxy-based multicloud computing framework allows dynamic, on-the-fly collaborations and resource sharing among cloud-based services, addressing trust, policy, and privacy issues without preestablished collaboration agreements or standardized interfaces.

    View the Full Publication
  • 01/01/2013PostgreSQL anomalous query detectorBilal Shebaro, Asmaa Sallam, Ashish Kamra

    We propose to demonstrate the design, implementation, and the capabilities of an anomaly detection (AD) system integrated with a relational database management system (DBMS). Our AD system is trained by extracting relevant features from the parse-tree representation of the SQL commands, and then uses the DBMS roles as the classes for the bayesian classifier. In the detection phase, the maximum apriori probability role is chosen by the classifier which, if not matching the role associated with the SQL command, raises an alarm. We have implemented such system in the PostgreSQL DBMS, integrated with the statistics collection and the query processing mechanism of the DBMS. During the demonstration, our audience will be given the choice of training our system using either synthetic role-based SQL query traces based on probability sampling, or by entering their own set of training queries. In the subsequent detection mode, the audience can test the detection capabilities of the system by submitting arbitrary SQL commands. We will also allow the audience to generate arbitrary work loads to measure the overhead of the training phase and the detection phase of our AD mechanism on the performance of the DBMS.

    View the Full Publication
  • 01/01/2013Efficient and accurate strategies for differentially-private sliding window queriesJianneng Cao, Qian Xiao, Gabriel Ghinita, Ninghui Li, Elisa Bertino, Kian-Lee Tan

    Regularly releasing the aggregate statistics about data streams in a privacy-preserving way not only serves valuable commercial and social purposes, but also protects the privacy of individuals. This problem has already been studied under differential privacy, but only for the case of a single continuous query that covers the entire time span, e.g., counting the number of tuples seen so far in the stream. However, most real-world applications are window-based, that is, they are interested in the statistical information about streaming data within a window, instead of the whole unbound stream. Furthermore, a Data Stream Management System (DSMS) may need to answer numerous correlated aggregated queries simultaneously, rather than a single one. To cope with these requirements, we study how to release differentially private answers for a set of sliding window aggregate queries. We propose two solutions, each consisting of query sampling and composition. We first selectively sample a subset of representative sliding window queries from the set of all the submitted ones. The representative queries are answered by adding Laplace noises in a way satisfying differential privacy. For each non-representative query, we compose its answer from the query results of those representatives. The experimental evaluation shows that our solutions are efficient and effective.

    View the Full Publication
  • 01/01/2013Efficient privacy-aware record integrationMehmet Kuzu, Murat Kantarcioglu, Ali Inan, Elisa Bertino, Elizabeth Durham, Bradley Malin

    The integration of information dispersed among multiple repositories is a crucial step for accurate data analysis in various domains. In support of this goal, it is critical to devise procedures for identifying similar records across distinct data sources. At the same time, to adhere to privacy regulations and policies, such procedures should protect the confidentiality of the individuals to whom the information corresponds. Various private record linkage (PRL) protocols have been proposed to achieve this goal, involving secure multi-party computation (SMC) and similarity preserving data transformation techniques. SMC methods provide secure and accurate solutions to the PRL problem, but are prohibitively expensive in practice, mainly due to excessive computational requirements. Data transformation techniques offer more practical solutions, but incur the cost of information leakage and false matches.

    In this paper, we introduce a novel model for practical PRL, which 1) affords controlled and limited information leakage, 2) avoids false matches resulting from data transformation. Initially, we partition the data sources into blocks to eliminate comparisons for records that are unlikely to match. Then, to identify matches, we apply an efficient SMC technique between the candidate record pairs. To enable efficiency and privacy, our model leaks a controlled amount of obfuscated data prior to the secure computations. Applied obfuscation relies on differential privacy which provides strong privacy guarantees against adversaries with arbitrary background knowledge. In addition, we illustrate the practical nature of our approach through an empirical analysis with data derived from public voter records.

    View the Full Publication
  • 01/01/2013Efficient tree pattern queries on encrypted XML documentsJianneng Cao, Fang-Yu Rao, Mehmet Kuzu, Elisa Bertino, Murat Kantarcioglu

    Outsourcing XML documents is a challenging task, because it encrypts the documents, while still requiring efficient query processing. Past approaches on this topic either leak structural information or fail to support searching that has constraints on XML node content. In addition, they adopt a filtering-and-refining framework, which requires the users to prune false positives from the query results. To address these problems, we present a solution for efficient evaluation of tree pattern queries (TPQs) on encrypted XML documents. We create a domain hierarchy, such that each XML document can be embedded in it. By assigning each node in the hierarchy a position, we create for each document a vector, which encodes both the structural and textual information about the document. Similarly, a vector is created also for a TPQ. Then, the matching between a TPQ and a document is reduced to calculating the distance between their vectors. For the sake of privacy, such vectors are encrypted before being outsourced. To improve the matching efficiency, we use a k-d tree to partition the vectors into non-overlapping subsets, such that non-matchable documents are pruned as early as possible. The extensive evaluation shows that our solution is efficient and scalable to large dataset.

    View the Full Publication
  • 01/01/2013Adaptive data protection in distributed systemsAnna Squicciarini, Giuseppe Petracca, Elisa Bertino

    Security is an important barrier to wide adoption of distributed systems for sensitive data storage and management. In particular, one unsolved problem is to ensure that customers data protection policies are honored, regardless of where the data is physically stored and how often it is accessed, modified, and duplicated. This issue calls for two requirements to be satisfied. First, data should be managed in accordance to both owners' preferences and to the local regulations that may apply. Second, although multiple copies may exist, a consistent view across copies should be maintained. Toward addressing these issues, in this work we propose innovative policy enforcement techniques for adaptive sharing of users' outsourced data. We introduce the notion of autonomous self-controlling objects (SCO), that by means of object-oriented programming techniques, encapsulate sensitive resources and assure their protection by means of adaptive security policies of various granularity, and synchronization protocols. Through extensive evaluation, we show that our approach is effective and efficiently manages multiple data copies.

    View the Full Publication
  • 01/01/2013FENCE: continuous access control enforcement in dynamic data stream environmentsRimma Nehme, Hyo-Sang Lim, Elisa Bertino

    In this paper, we address the problem of continuous access control enforcement in dynamic data stream environments, where both data and query security restrictions may potentially change in real-time. We present FENCE framework that ffectively addresses this problem. The distinguishing characteristics of FENCE include: (1) the stream-centric approach to security, (2) the symmetric model for security settings of both continuous queries and streaming data, and (3) two alternative security-aware query processing approaches that can optimize query execution based on regular and security-related selectivities. In FENCE, both data and query security restrictions are modeled symmetrically in the form of security metadata, called "security punctuations" embedded inside data streams. We distinguish between two types of security punctuations, namely, the data security punctuations (or short, dsps) which represent the access control policies of the streaming data, and the query security punctuations (or short, qsps) which describe the access authorizations of the continuous queries. We also present our encoding method to support XACML(eXtensible Access Control Markup Language) standard. We have implemented FENCE in a prototype DSMS and present our performance evaluation. The results of our experimental study show that FENCE's approach has low overhead and can give great performance benefits compared to the alternative security solutions for streaming environments.

    View the Full Publication
  • 01/01/2013An efficient certificateless cryptography scheme without pairingSeung-Hyun Seo, Mohamed Nabeel, Xiaoyu Ding, Elisa Bertino

    We propose a mediated certificateless encryption scheme without pairing operations. Mediated certificateless public key encryption (mCL-PKE) solves the key escrow problem in identity based encryption and certificate revocation problem in public key cryptography. However, existing mCL-PKE schemes are either inefficient because of the use of expensive pairing operations or vulnerable against partial decryption attacks. In order to address the performance and security issues, in this poster, we propose a novel mCL-PKE scheme. We implement our mCL-PKE scheme and a recent scheme, and evaluate the security and performance. Our results show that our algorithms are efficient and practical.

    View the Full Publication
  • 01/01/2013A file provenance systemSalmin Sultana, Elisa Bertino

    A file provenance system supports the automatic collection and management of provenance i.e. the complete processing history of a data object. File system level provenance provides functionality unavailable in the existing provenance systems. In this paper, we discuss the design objectives for a flexible and efficient file provenance system and then propose the design of such a system, called FiPS. We design FiPS as a thin stackable file system for capturing provenance in a portable manner. FiPS can capture provenance at various degrees of granularity, can transform provenance records into secure information, and can direct the resulting provenance data to various persistent storage systems.

    View the Full Publication
  • 01/01/2013Collusion Detection in Online Rating SystemsMohammad Allahbakhsh, Aleksandar Ignjatovic, Boualem Benatallah, Seyed-Mehdi-Reza Beheshti, Elisa Bertino, Norman Foo

    Online rating systems are subject to unfair evaluations. Users may try to individually or collaboratively promote or demote a product. Collaborative unfair rating, i.e., collusion, is more damaging than individual unfair rating. Detecting massive collusive attacks as well as honest looking intelligent attacks is still a real challenge for collusion detection systems. In this paper, we study impact of collusion in online rating systems and asses their susceptibility to collusion attacks. The proposed model uses frequent itemset mining technique to detect candidate collusion groups and sub-groups. Then, several indicators are used for identifying collusion groups and to estimate how damaging such colluding groups might be. The model has been implemented and we present results of experimental evaluation of our methodology.

    View the Full Publication
  • 01/01/2013Single-Database Private Information Retrieval from Fully Homomorphic EncryptionXun Yi, Mohammed Golam Kaosar, Russell Paulet, Elisa Bertino

    Private Information Retrieval (PIR) allows a user to retrieve the $(i)$th bit of an $(n)$-bit database without revealing to the database server the value of $(i)$. In this paper, we present a PIR protocol with the communication complexity of $(O(\gamma \log n))$ bits, where $(\gamma)$ is the ciphertext size. Furthermore, we extend the PIR protocol to a private block retrieval (PBR) protocol, a natural and more practical extension of PIR in which the user retrieves a block of bits, instead of retrieving single bit. Our protocols are built on the state-of-the-art fully homomorphic encryption (FHE) techniques and provide privacy for the user if the underlying FHE scheme is semantically secure. The total communication complexity of our PBR is $(O(\gamma \log m+\gamma n/m))$ bits, where $(m)$ is the number of blocks. The total computation complexity of our PBR is $(O(m\log m))$ modular multiplications plus $(O(n/2))$ modular additions. In terms of total protocol execution time, our PBR protocol is more efficient than existing PBR protocols which usually require to compute $(O(n/2))$ modular multiplications when the size of a block in the database is large and a high-speed network is available.

    View the Full Publication
  • 01/01/2013Quality Control in Crowdsourcing Systems: Issues and DirectionsMohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, Schahram Dustdar

    As a new distributed computing model, crowdsourcing lets people leverage the crowd's intelligence and wisdom toward solving problems. This article proposes a framework for characterizing various dimensions of quality control in crowdsourcing systems, a critical issue. The authors briefly review existing quality-control approaches, identify open issues, and look to future research directions. In the Web extra, the authors discuss both design-time and runtime approaches in more detail.

    View the Full Publication
  • 01/01/2013OpenMPC: extended OpenMP for efficient programming and tuning on GPUsSeyong Lee, Rudolf Eigenmann

    General-purpose graphics processing units (GPGPUs) provide inexpensive, high performance platforms for compute-intensive applications. However, their programming complexity poses a significant challenge to developers. Even though the compute unified device architecture (CUDA) programming model offers better abstraction, developing efficient GPGPU code is still complex and error-prone. This paper proposes a directive-based, high-level programming model, called OpenMPC, which addresses both programmability and tunability issues on GPGPUs. We have developed a fully automatic compilation and user-assisted tuning system supporting OpenMPC. In addition to a range of compiler transformations and optimisations, the system includes tuning capabilities for generating, pruning, and navigating the search space of compilation variants. Evaluation using 14 applications shows that our system achieves 75% of the performance of the hand-coded CUDA programmes (92% if excluding one exceptional case).

    View the Full Publication
  • 01/01/2013Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterionLei Cen, Eduard Dragut, Luo Si, Mourad Ouzzani

    Entity disambiguation is an important step in many information retrieval applications. This paper proposes new research for entity disambiguation with the focus of name disambiguation in digital libraries. In particular, pairwise similarity is first learned for publications that share the same author name string (ANS) and then a novel Hierarchical Agglomerative Clustering approach with Adaptive Stopping Criterion (HACASC) is proposed to adaptively cluster a set of publications that share a same ANS to individual clusters of publications with different author identities. The HACASC approach utilizes a mixture of kernel ridge regressions to intelligently determine the threshold in clustering. This obtains more appropriate clustering granularity than non-adaptive stopping criterion. We conduct a large scale empirical study with a dataset of more than 2 million publication record pairs to demonstrate the advantage of the proposed HACASC approach.

    View the Full Publication
  • 01/01/2013Search result diversification in resource selection for federated searchDzung Hong, Luo Si

    Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect the various kinds of information needs of users in real world applications.

    This paper proposes two general approaches to model both result relevance and diversification in selecting sources, in order to provide more comprehensive coverage of multiple aspects of a user query. The first approach focuses on diversifying the document ranking on a centralized sample database before selecting information sources under the framework of Relevant Document Distribution Estimation (ReDDE). The second approach first evaluates the relevance of information sources with respect to each aspect of the query, and then ranks the sources based on the novelty and relevance that they offer. Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document. Moreover, this paper proposes a learning based approach to combine multiple resource selection algorithms for result diversification, which can further improve the performance. We propose a set of new metrics for resource selection in federated search to evaluate the diversification performance of different approaches. To our best knowledge, this is the first piece of work that addresses the problem of search result diversification in federated search. The effectiveness of the proposed approaches has been demonstrated by an extensive set of experiments on the federated search testbed of the Clueweb dataset.

    View the Full Publication
  • 01/01/2013Semantic hashing using tags and topic modelingQifan Wang, Dan Zhang, Luo Si

    It is an important research problem to design efficient and effective solutions for large scale similarity search. One popular strategy is to represent data examples as compact binary codes through semantic hashing, which has produced promising results with fast search speed and low storage cost. Many existing semantic hashing methods generate binary codes for documents by modeling document relationships based on similarity in a keyword feature space. Two major limitations in existing methods are: (1) Tag information is often associated with documents in many real world applications, but has not been fully exploited yet; (2) The similarity in keyword feature space does not fully reflect semantic relationships that go beyond keyword matching.

    This paper proposes a novel hashing approach, Semantic Hashing using Tags and Topic Modeling (SHTTM), to incorporate both the tag information and the similarity information from probabilistic topic modeling. In particular, a unified framework is designed for ensuring hashing codes to be consistent with tag information by a formal latent factor model and preserving the document topic/semantic similarity that goes beyond keyword matching. An iterative coordinate descent procedure is proposed for learning the optimal hashing codes. An extensive set of empirical studies on four different datasets has been conducted to demonstrate the advantages of the proposed SHTTM approach against several other state-of-the-art semantic hashing techniques. Furthermore, experimental results indicate that the modeling of tag information and utilizing topic modeling are beneficial for improving the effectiveness of hashing separately, while the combination of these two techniques in the unified framework obtains even better results.

    View the Full Publication
  • 01/01/2013The Palm-tree Index: Indexing with the crowdAamer Mahmood, Walid G. Aref, Eduard Dragut, Saleh Basalamah

    Crowdsourcing services allow employing human intelligence in tasks that are difficult to accomplish with computers such as image tagging and data collection. At a relatively low monetary cost and through web interfaces such as Amazon’s Mechanical Turk (AMT), humans can act as a computational operator in large systems. Recent work has been conducted to build database management systems that can harness the crowd power in database operators, such as sort, join, count, etc. The fundamental problem of indexing within crowdsourced databases has not been studied. In this paper, we study the problem of tree-based indexing within crowd-nabled databases. We investigate the effect of various tree and crowdsourcing parameters on the quality of index operations. We propose new algorithms for index search, insert, and update.

    View the Full Publication
  • 01/01/2013Continuous aggregate nearest neighbor queriesHicham Elmongui, Mohamed Mokbel, Walid G. Aref

    This paper addresses the problem of continuous aggregate nearest-neighbor (CANN) queries for moving objects in spatio-temporal data stream management systems. A CANN query specifies a set of landmarks, an integer k, and an aggregate distance function f (e.g., min, max, or sum), where f computes the aggregate distance between a moving object and each of the landmarks. The answer to this continuous query is the set of k moving objects that have the smallest aggregate distance f. A CANN query may also be viewed as a combined set of nearest neighbor queries. We introduce several algorithms to continuously and incrementally answer CANN queries. Extensive experimentation shows that the proposed operators outperform the state-of-the-art algorithms by up to a factor of 3 and incur low memory overhead.

    View the Full Publication
  • 01/01/2013A distributed cloud architecture for mobile multimedia servicesMuhamed Felemban, Saleh Basalamah, Arif Ghafoor

    Mobile cloud computing is emerging as a new paradigm for supporting a broad range of multimedia services. MCC alleviates the burden of storage and computation on mobile devices. In this article, we describe design requirements and an architecture for MCC. The novelty in this architecture is an integrated cloudlet and base station subsystem that can meet application-level quality of service requirements and allow mobile resource provisioning close to the user. We present a layered architecture for MCC that elucidates the required functions and protocols. We also propose a connection handoff mechanism among cloudlets and discuss related resource management challenges for MCC.

    View the Full Publication
  • 01/01/2013Scaling large-data computations on multi-GPU acceleratorsAmit Sabne, Putt Sakdhnagool, Rudolf Eigenmann

    Modern supercomputers rely on accelerators to speed up highly parallel workloads. Intricate programming models, limited device memory sizes and overheads of data transfers between CPU and accelerator memories are among the open challenges that restrict the widespread use of accelerators. First, this paper proposes a mechanism and an implementation to automatically pipeline the CPU-GPU memory channel so as to overlap the GPU computation with the memory copies, alleviating the data transfer overhead. Second, in doing so, the paper presents a technique called Computation Splitting, COSP, that caters to arbitrary device memory sizes and automatically manages to run out-of-card OpenMP-like applications on GPUs. Third, a novel adaptive runtime tuning mechanism is proposed to automatically select the pipeline stage size so as to gain the best possible performance. The mechanism adapts to the underlying hardware in the starting phase of a program and chooses the pipeline stage size. The techniques are implemented in a system that is able to translate an input OpenMP program to multiple GPUs attached to the same host CPU. Experimentation on a set of nine benchmarks shows that, on average, the pipelining scheme improves the performance by 1.49x, while limiting the runtime tuning overhead to 3% of the execution time.

    View the Full Publication
  • 01/01/2013Don's be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changesMohamed Yakout, Laure Berti-Equille, Ahmed Elmagarmid

    Various computational procedures or constraint-based methods for data repairing have been proposed over the last decades to identify errors and, when possible, correct them. However, these approaches have several limitations including the scalability and quality of the values to be used in replacement of the errors. In this paper, we propose a new data repairing approach that is based on maximizing the likelihood of replacement data given the data distribution, which can be modeled using statistical machine learning techniques. This is a novel approach combining machine learning and likelihood methods for cleaning dirty databases by value modification. We develop a quality measure of the repairing updates based on the likelihood benefit and the amount of changes applied to the database. We propose SCARE (SCalable Automatic REpairing), a systematic scalable framework that follows our approach. SCARE relies on a robust mechanism for horizontal data partitioning and a combination of machine learning techniques to predict the set of possible updates. Due to data partitioning, several updates can be predicted for a single record based on local views on each data partition. Therefore, we propose a mechanism to combine the local predictions and obtain accurate final predictions. Finally, we experimentally demonstrate the effectiveness, efficiency, and scalability of our approach on real-world datasets in comparison to recent data cleaning approaches.

    View the Full Publication
  • 01/01/2013NADEEF: a commodity data cleaning systemMichele Dallachiesa, Amr Abaid, Ahmed Eldawy, Ahmed Elmagarmid, Mourad Ouzzani, Ihab Ilyas, Nan Tang

    Despite the increasing importance of data quality and the rich theoretical and practical contributions in all aspects of data cleaning, there is no single end-to-end off-the-shelf solution to (semi-)automate the detection and the repairing of violations w.r.t. a set of heterogeneous and ad-hoc quality constraints. In short, there is no commodity platform similar to general purpose DBMSs that can be easily customized and deployed to solve application-specific data quality problems. In this paper, we present NADEEF, an extensible, generalized and easy-to-deploy data cleaning platform. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows the users to specify multiple types of data quality rules, which uniformly define what is wrong with the data and (possibly) how to repair it through writing code that implements predefined classes. We show that the programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. Treating user implemented interfaces as black-boxes, the core provides algorithms to detect errors and to clean data. The core is designed in a way to allow cleaning algorithms to cope with multiple rules holistically, i.e. detecting and repairing data errors without differentiating between various types of rules. We showcase two implementations for core repairing algorithms. These two implementations demonstrate the extensibility of our core, which can also be replaced by other user-provided algorithms. Using real-life data, we experimentally verify the generality, extensibility, and effectiveness of our system.

    View the Full Publication
  • 01/01/2013NADEEF: A Generalized Data Cleaning SystemAmr Abaid, Ahmed Elmagarmid, Ihab Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, Si Yin

    We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface allows users to specify data quality rules by writing code that implements prede ned classes. These classes uniformly dene what is wrong with the data and (possibly) how to fix it. We will demonstrate the following features provided by NADEEF. (1) Heterogeneity: The programming interface can be used to express many types of data quality rules beyond the well known CFDs (FDs), MDs and ETL rules. (2) Interdependency: The core algorithms can interleave multiple types of rules to detect and repair data errors. (3) Deployment and extensibility: Users can easily customize NADEEF by de ning new types of rules, or by extending the core. (4) Metadata management and data custodians: We show a live data quality dashboard to e ectively involve users in the data cleaning process.

    View the Full Publication
  • 01/01/2013A fast parallel maximum clique algorithm for large sparse graphs and temporal strong componentsRyan Rossi, David Gleich, Assefaw Gebramedhin

    We propose a fast, parallel, maximum clique algorithm for large, sparse graphs that is designed to exploit characteristics of social and information networks. We observe roughly linear runtime scaling over graphs between 1000 vertices and 100M vertices. In a test with a 1.8 billion-edge social network, the algorithm finds the largest clique in about 20 minutes. For social networks, in particular, we found that using the core number of a vertex in combination with a good heuristic clique finder efficiently removes the vast majority of the search space. In addition, we parallelize the exploration of the search tree. In the algorithm, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with especially large search spaces can be pruned by other processes. We use this clique finder to investigate the size of the largest temporal strong components in dynamic networks, which requires finding the largest clique in a particular temporal reachability graph.

    View the Full Publication
  • 11/01/2012Using automated individual white-list to protect web digital identitiesWeili Han, Ye Cao, Elisa Bertino, Jianming Yong

    The theft attacks of web digital identities, e.g., phishing, and pharming, could result in severe loss to users and vendors, and even hold users back from using online services, e-business services, especially. In this paper, we propose an approach, referred to as automated individual white-list (AIWL), to protect user’s web digital identities. AIWL leverages a Naïve Bayesian classifier to automatically maintain an individual white-list of a user. If the user tries to submit his or her account information to a web site that does not match the white-list, AIWL will alert the user of the possible attack. Furthermore, AIWL keeps track of the features of login pages (e.g., IP addresses, document object model (DOM) paths of input widgets) in the individual white-list. By checking the legitimacy of these features, AIWL can efficiently defend users against hard attacks, especially pharming, and even dynamic pharming. Our experimental results and user studies show that AIWL is an efficient tool for protecting web digital identities.

    View the Full Publication
  • 11/01/2012Interactive Web-Based Breastfeeding Monitoring FeasibilityAzza Ahmed, Mourad Ouzzani

    BACKGROUND: Strategies that promote higher exclusive breastfeeding rate and duration are highly recommended. To date, no study has tested the feasibility of Web-based monitoring among breastfeeding mothers.

    GOALS: To develop an interactive Web-based breastfeeding monitoring system (LACTOR) and examine its feasibility, usability, and acceptability among breastfeeding mothers.

    METHODS: A prospective, descriptive, mixed-methods study was conducted. Mothers who met the study inclusion criteria were recruited from mother infant units in 2 Midwestern hospitals in the United States. Mothers were asked to enter their breastfeeding data daily through the system for 30 days and then submit an online exit survey. This survey consisted of a system usability scale and mothers' perceptions form. Twenty-six mother/infant dyads completed the study.

    RESULTS: The Feasibility of LACTOR was established by mothers' compliance in entering their breastfeeding data. The mean was 8.87 (SD = 1.21) daily entries, and the range was 6-13 times per day. Usability scale total mean score was 3.35 (SD = 0.33; scale range 0-4). Ninety-two percent of the mothers thought that they did not need to learn many skills before they started to use LACTOR and did not need any technical support. Mothers reported that the monitoring was beneficial and gave them the chance to track their infants' feeding patterns and detect any problems early.

    CONCLUSIONS: This study demonstrated the feasibility of LACTOR, and it was user-friendly and acceptable among mothers. Further studies to test its effect on breastfeeding outcomes are needed.

    View the Full Publication
  • 10/29/2012Lonomics Atlas: a tool to explore interconnected ionomic, genomic and environmental dataEduard Dragut, Amgad Madkour, Mohamed Nabeel, Peter Baker, Mourad Ouzzani, David Salt

    Ionomics Atlas facilitates access, analysis and interpretation of an existing large-scale heterogeneous dataset consisting of ionomic (elemental composition of an organism), ge- netic (heritable changes in the DNA of an organism) and ge- ographic information (geographic location, altitude, climate, soil properties, etc). Ionomics Atlas allows connections to be made between the genetic regulation of the ionome of plant populations and their landscape distribution, allowing scientists to investigate the role of natural ionomic variation in adaptation of populations to varied environmental conditions in the landscape. The goal of the Ionomics Atlas is twofold: (1) to allow both novice and expert users to easily access and explore layers of interconnected ionomic, genomic and environmental data; and (2) to facilitate hypothesis generation and testing by proving direct querying and browsing of the data as well as different display modes of the results.

    View the Full Publication
  • 10/01/2012Preserving privacy of feedback providers in decentralized reputation systemsOmar Hasan, Lionel Brunie, Elisa Bertino

    Reputation systems make the users of a distributed application accountable for their behavior. The reputation of a user is computed as an aggregate of the feedback provided by other users in the system. Truthful feedback is clearly a prerequisite for computing a reputation score that accurately represents the behavior of a user. However, it has been observed that users often hesitate in providing truthful feedback, mainly due to the fear of retaliation. We present a decentralized privacy preserving reputation protocol that enables users to provide feedback in a private and thus uninhibited manner. The protocol has linear message complexity, which is an improvement over comparable decentralized reputation protocols. Moreover, the protocol allows users to quantify and maximize the probability that their privacy will be preserved.

    View the Full Publication
  • 10/01/2012Reducing the complexity of BGP stability analysis with hybrid combinatorial-algebraic modelsDebbie Perouli, Stefano Vissicchio, Alexander Gurney, Olaf Maennel, Timmothy Griffin, Iain Phillips, Sonia Fahmy, Cristel Pelsser

    Routing stability and correctness in the Internet have long been a concern. Despite this, few theoretical frameworks have been proposed to check BGP configurations for convergence and safety. The most popular approach is based on the Stable Paths Problem (SPP) model. Unfortunately, SPP requires enumeration of all possible control-plane paths, which is infeasible in large networks. In this work, we study how to apply algebraic frameworks to the BGP configuration checking problem. We propose an extension of the Stratified Shortest Path Problem (SSPP) model that has a similar expressive power to SPP, but enables more efficient checking of configuration correctness. Our approach remains valid when BGP policies are applied to iBGP sessions - a case which is often overlooked by previous work, although common in today's Internet. While this paper focuses mainly on iBGP problems, our methodology can be extended to eBGP if operators are willing to share their local-preference configurations.

    View the Full Publication
  • 10/01/2012An Analytic Approach to People Evaluation in Crowdsourcing SystemsMohammad Allahbakhsh, Aleksandar Ignjatovic, Boualem Benatallah, Seyed-Mehdi-Reza Beheshti, Elisa Bertino

    Worker selection is a significant and challenging issue in crowdsourcing systems. Such selection is usually based on an assessment of the reputation of the individual workers participating in such systems. However, assessing the credibility and adequacy of such calculated reputation is a real challenge. In this paper, we propose an analytic model which leverages the values of the tasks completed, the credibility of the evaluators of the results of the tasks and time of evaluation of the results of these tasks in order to calculate an accurate and credible reputation rank of participating workers and fairness rank for evaluators. The model has been implemented and experimentally validated.

    View the Full Publication
  • 10/01/2012Adaptive data management for self-protecting objects in cloud computing systemsAnna Squicciarini, Giuseppe Petracca, Elisa Bertino

    While Cloud data services are a growing successful business and computing paradigm, data privacy and security are major concerns. One critical problem is to ensure that data owners' policies are honored, regardless of where the data is physically stored and how often it is accessed, and modified. This scenario calls for an important requirement to be satisfied. Data should be managed in accordance to owners' preferences, Cloud providers service agreements, and the local regulations that may apply. In this work we propose innovative policy enforcement techniques for adaptive sharing of users' outsourced data. We introduce the notion of autonomous security-aware objects, that by means of object-oriented programming techniques, encapsulate sensitive resources and assure their protection. Our evaluation demonstrates that our approach is effective.

    View the Full Publication
  • 10/01/2012Secure sensor network SUM aggregation with detection of malicious nodesSunoh Choi, Gabriel Ghinita, Elisa Bertino

    In-network aggregation is an essential operation which reduces communication overhead and power consumption of resource-constrained sensor network nodes. Sensor nodes are typically organized into an aggregation tree, whereby aggregator nodes collect data from multiple data source nodes, and perform a reduction operation such as sum, average, minimum, etc. The result is then forwarded to other aggregators higher in the hierarchy toward a base station (or sink node) that receives the final outcome of the in-network computation. However, despite its performance benefits, aggregation introduces several difficult security challenges with respect to data confidentiality, integrity and authenticity. In today's outsource-centric computing environments, the aggregation task may be delegated to a third party that is not fully trusted. In addition, even in the absence of outsourcing, nodes may be compromised by a malicious adversary with the purpose of altering aggregation results. To defend against such threats, several mechanisms have been proposed, most of which devise aggregation schemes that rely on cryptography to detect that an attack has occurred. Although they prevent the sink from accepting an incorrect result, such techniques are vulnerable to denial-of-service if a compromised node alters the aggregation result in each round. Several more recent approaches also identify the malicious nodes and exclude them from future computation rounds. However, these incur high communication overhead as they require flooding or other expensive communication models to connect individual nodes with the base station. We propose a flexible aggregation structure (FAS) and an advanced ring structure (ARS) topology that allow secure aggregation and efficient identification of malicious aggregator nodes for the SUM operation. Our scheme uses only symmetric key cryptography, outperforms existing solutions in terms of performance, and guarantees that the aggregate result is correct and that malicious nodes are identifie

    View the Full Publication
  • 10/01/2012Secure sensor network SUM aggregation with detection of malicious nodesSunoh Choi, Gabriel Ghinita, Elisa Bertino

    37th Annual IEEE Conference on Local Computer NetworksClearwater Beach, FL, USA USA October 22-October 25

    View the Full Publication
  • 09/14/2012The bench scientist's guide to statistical analysis of RNA-Seq dataCraig R. Yendrek, Elizabeth A. Ainsworth, Jyothi Thimmapuram

    RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatics specialists. Here we provide a step-by-step guide and outline a strategy using currently available statistical tools that results in a conservative list of differentially expressed genes. We also discuss potential sources of error in RNA-Seq analysis that could alter interpretation of global changes in gene expression.

    View the Full Publication
  • 09/01/2012SYSTEM ON CHIP AND METHOD FOR CRYPTOGRAPHY USING A PHYSICALLY UNCLONABLE FUNCTIONMichael Kirkpatrick, Sam Kerr, Elisa Bertino

    A system and method for performing cryptographic functions in hardware using read-N keys comprising a cryptographic core, seed register, physically unclonable function (PUF), an error-correction core, a decryption register, and an encryption register. The PUF configured to receive a seed value as an input to generate a key as an output. The error-correction core configured to transmit the key to the cryptographic core. The encryption register and decryption register configured to receive the seed value and the output. The system, a PUF ROK, configured to generate keys that are used N times to perform cryptographic functions.

    View the Full Publication
  • 09/01/2012Modeling the Risk \& Utility of Information Sharing in Social NetworksMohamed R Fouad, Khaled Elbassioni, Elisa Bertino

    With the widespread of social networks, the risk of information sharing has become inevitable. Sharing a user's particular information in social networks is an all-or-none decision. Users receiving friendship invitations from others may decide to accept this request and share their information or reject it in which case none of their information will be shared. Access control in social networks is a challenging topic. Social network users would want to determine the optimum level of details at which they share their personal information with other users based on the risk associated with the process. In this paper, we formulate the problem of data sharing in social networks using two different models: (i) a model based on emph{diffusion kernels}, and (ii) a model based on access control. We show that it is hard to apply the former in practice and explore the latter. We prove that determining the optimal levels of information sharing is an NP-hard problem and propose an approximation algorithm that determines to what extent social network users share their own information. We propose a trust-based model to assess the risk of sharing sensitive information and use it in the proposed algorithm. Moreover, we prove that the algorithm could be solved in polynomial time. Our results rely heavily on adopting the super modularity property of the risk function, which allows us to employ techniques from convex optimization. To evaluate our model, we conduct a user study to collect demographic information of several social networks users and get their perceptions on risk and trust. In addition, through experimental studies on synthetic data, we compare our proposed algorithm with the optimal algorithm both in terms of risk and time. We show that the proposed algorithm is scalable and that the sacrifice in risk is outweighed by the gain in efficiency

    View the Full Publication
  • 08/01/2012Defending against insider threats and internal data leakageElisa Bertino, Ilsun You, Gabriele Lenzini, Marek Ogiela

    In the last decade, computer science researchers have beenworking hard to prevent attacks against the security ofinformation systems. Different adversary models haveincarnated the malicious entities against which researchershave defined security properties, identified securityvulnerabilities, and engineered security defenses. Theseadversaries were usually intruders, that is, outsiders tryingto break into a system’s defenses.

    View the Full Publication
  • 08/01/2012Privacy preserving delegated access control in the storage as a service modelMohamed Nabeel, Elisa Bertino

    Current approaches for enforcing fine-grained access control and confidentiality to sensitive data hosted in the cloud are based on selectively encrypting the data before uploading it to the cloud. In such an approach, organizations have to enforce authorization policies through encryption. They thus incur high communication and computation cost to manage keys and encryptions whenever user credentials or organizational authorization policies change. Ideally, organizations should use encryption only in order to hide the data from the cloud, whereas the cloud should be in charge of enforcing authorization policies on the hidden data in order to minimize the overhead at organizations. In this paper, we propose a novel approach for delegating privacy-preserving fine-grained access enforcement to the cloud. Our approach is based on a recent key management scheme that allows users whose attributes satisfy a certain policy to derive the data encryption keys only for the content they are allowed to access from the cloud. Our approach preserves the confidentiality of the data and the user privacy from the cloud, while delegating most of the access control enforcement to the cloud. Further, in order to reduce the cost of re-encryption required whenever the access control policies changes, our approach uses incremental encryption techniques.

    View the Full Publication
  • 08/01/2012Panel: Using information re-use and integration principles in big dataElisa Bertino, Stuart Rubin, Taghi Khoshgoftaar, Bhavani Thuraisingham, James McCaffrey

    More advanced and complex applications in social networks, gaming, entertainment, medical research, and GIS, to name a few, drive the requirements to process large data sets, commonly called BigData. As mentioned in an IBM report [1], BigData is characterized not only by volume (terabytes and petabytes) but also by velocity (speed requirements) and variety (types of data sets). The manipulation of such large data sets requirement massively parallel software using a large number of servers.

    View the Full Publication
  • 08/01/2012Auxin and ABA act as central regulators of developmental networks associated with paradormancy in Canada thistle (Cirsium arvense).JV Anderson, M Dogramaci, DP Horvath, ME Foley, WS Chao, JC Suttle, Jyothi Thimmapuram, AG Hernandez, S Ali, MA Mikel

    Dormancy in underground vegetative buds of Canada thistle, an herbaceous perennial weed, allows escape from current control methods and contributes to its invasive nature. In this study, ~65 % of root sections obtained from greenhouse propagated Canada thistle produced new vegetative shoots by 14 days post-sectioning. RNA samples obtained from sectioned roots incubated 0, 24, 48, and 72 h at 25°C under 16:8 h light-dark conditions were used to construct four MID-tagged cDNA libraries. Analysis of in silico data obtained using Roche 454 GS-FLX pyrosequencing technologies identified molecular networks associated with paradormancy release in underground vegetative buds of Canada thistle. Sequencing of two replicate plates produced ~2.5 million ESTs with an average read length of 362 bases. These ESTs assembled into 67358 unique sequences (21777 contigs and 45581 singlets) and annotation against the Arabidopsis database identified 15232 unigenes. Among the 15232 unigenes, we identified processes enriched with transcripts involved in plant hormone signaling networks. To follow-up on these results, we examined hormone profiles in roots, which identified changes in abscisic acid (ABA) and ABA metabolites, auxins, and cytokinins post-sectioning. Transcriptome and hormone profiling data suggest that interaction between auxin- and ABA-signaling regulate paradormancy maintenance and release in underground adventitious buds of Canada thistle. Our proposed model shows that sectioning-induced changes in polar auxin transport alters ABA metabolism and signaling, which further impacts gibberellic acid signaling involving interactions between ABA and FUSCA3. Here we report that reduced auxin and ABA-signaling, in conjunction with increased cytokinin biosynthesis post-sectioning supports a model where interactions among hormones drives molecular networks leading to cell division, differentiation, and vegetative outgrowth.

    View the Full Publication
  • 08/01/2012Efficient and Practical Approach for Private Record LinkageMohamed Yakout, Mikhail J. Atallah, Ahmed Elmagarmid

    Record linkage is used to associate entities from multiple data sources. For example, two organizations contemplating a merger may want to know how common their customer bases are so that they may better assess the benefits of the merger. Another example is a database of people who are forbidden from a certain activity by regulators, may need to be compared to a list of people engaged in that activity. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data; they fear losing control over its subsequent dissemination and usage, or they want to insure privacy because the data is proprietary or confidential, and/or they are cautious simply because privacy laws forbid its disclosure or regulate the form of that disclosure. In such cases, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (1) our techniques make no use of a third party, and (2) they achieve much better performance than previous schemes in terms of their execution time while maintaining acceptable quality of output compared to nonprivacy settings. Our protocol consists of two phases. The first phase primarily produces candidate record pairs for matching, by carrying out a very fast (but not accurate) matching between such pairs of records. The second phase is a novel protocol for efficiently computing distances between each candidate pair (without any expensive cryptographic operations such as modular exponentiations). Our experimental evaluation of our approach validates these claims.

    View the Full Publication
  • 08/01/2012ARES Conference - Message from ARES Conference OfficersElisa Bertino, A Min Tjoa, Gerald Quirchmayr

    he Seventh International Conference on Availability, Reliability and Security (ARES 2012) brings together researchers and practitioners in the field of dependability. ARES 2012 highlights the various aspects of dependability, following the tradition of previous ARES conferences, again with a special focus on the crucial linkage between availability, reliability and security. ARES aims at contributing to an intensive discussion of research issues in the field of dependabilityas an integrative concept that in its core comprises research contributions from availability, safety, confidentiality, integrity, maintainability and security and their different areas of application. This year’s conference emphasizes the interplay between foundations and practical issues of research in information security and will also look at upcoming research challenges. ARES 2012 is dedicated to expanding collaborations between different sub‐disciplines and to strengthening the community for further research which, previous ARES conferences have started tobuild. We are very happy to welcome two well‐known keynote speakers: Annie I. Antón and Chenxi Wang. From many submissions we have selected the 15 best for a presentation as full paper. The quality of submissions has steadily improved over the last years and the conference officers sometimes faced a difficult decision when selecting which papers should be accepted. This year’s acceptance rate is 15% for full papers. In addition, several workshops and short papers show ongoing research projects and offer interesting starting points for discussions.

    View the Full Publication
  • 08/01/2012IEEE IRI 2012 INTERNATIONAL TECHNICAL PROGRAM COMMITTEEChengcui Zhang, Elisa Bertino, Bhavani Thuraisingham, James Joshi

    Welcome to the proceedings of the 13th IEEE International Conference on Information Reuse and Integration (IEEE IRI 2012) in Las Vegas, Nevada, USA. Information Reuse and Integration (IRI) aims at maximizing the reuse of information by creating simple, rich, and reusable knowledge representations and consequently explores strategies for integrating this knowledge into legacy systems. IRI plays a pivotal role in the capture, representation, maintenance, integration, validation, and extrapolation of information; and applies both information and knowledge for enhancing decision-making in various application domains. During more than adecade of conferences, IRI has established itself as an internationally renowned forum for researchers and practitioners to exchange ideas, connect with colleagues, and advance the state of the art and practice of current and future research in information reuse and integration.

    View the Full Publication
  • 08/01/2012Secure provenance transmission for streaming dataSalmin Sultana, Mohamed Shehab, Elisa Bertino

    Many application domains, such as real-time financial analysis, e-healthcare systems, sensor networks, are characterized by continuous data streaming from multiple sources and through intermediate processing by multiple aggregators. Keeping track of data provenance in such highly dynamic context is an important requirement, since data provenance is a key factor in assessing data trustworthiness which is crucial for many applications. Provenance management for streaming data requires addressing several challenges, including the assurance of high processing throughput, low bandwidth consumption, storage efficiency and secure transmission. In this paper, we propose a novel approach to securely transmit provenance for streaming data (focusing on sensor network) by embedding provenance into the interpacket timing domain while addressing the above mentioned issues. As provenance is hidden in another host-medium, our solution can be conceptualized as watermarking technique. However, unlike traditional watermarking approaches, we embed provenance over the interpacket delays (IPDs) rather than in the sensor data themselves, hence avoiding the problem of data degradation due to watermarking. Provenance is extracted by the data receiver utilizing an optimal threshold-based mechanism which minimizes the probability of provenance decoding errors. The resiliency of the scheme against outside and inside attackers is established through an extensive security analysis. Experiments show that our technique can recover provenance up to a certain level against perturbations to inter-packet timing characteristics.

    View the Full Publication
  • 08/01/2012On XACML's adequacy to specify and to enforce HIPAAOmas Chowdhury, Haining Chen, Jianwei Niu, Ninghui Li, Elisa Bertino

    In the medical sphere, personal and medical informa-tion is collected, stored, and transmitted for various pur-poses, such as, continuity of care, rapid formulationof diagnoses, and billing. Many of these operationsmust comply with federal regulations like the HealthInsurance Portability and Accountability Act (HIPAA).To this end, we need a specification language that canprecisely capture the requirements of HIPAA. We alsoneed an enforcement engine that can enforce the pri-vacy policies specified in the language. In the currentwork, we evaluate eXtensible Access Control MarkupLanguage (XACML) as a candidate specification lan-guage for HIPAA privacy rules. We evaluate XACMLbased on the set of features required to sufficiently ex-press HIPAA, proposed by a prior work. We also discusswhich of the features necessary for expressing HIPAAare missing in XACML. We then present high level de-signs of how to enhance XACML

    View the Full Publication
  • 07/01/2012A role-involved purpose-based access control modelMd. Enamul Kabir, Hua Wang, Elisa Bertino

    This paper presents a role-involved purpose-based access control (RPAC) model, where a conditional purpose is defined as the intention of data accesses or usages under certain conditions. RPAC allows users using some data for a certain purpose with Conditions (For instance, Tony agrees that his income information can be used for marketing purposes by removing his name). The structure of RPAC model is investigated after defining access purposes, intended purposes and conditional purposes. An algorithm is developed with role-based access control (RBAC) to achieve the compliance computation between access purposes (related to data access) and intended purposes (related to data objects). Access purpose authorization and authentication in the RPAC model are studied with the hierarchical purpose structure. According to the model, more information from data providers can be extracted while at the same time assuring privacy that maximizes the usability of consumers’ data. It extends role-based access control models to a further coverage of privacy preservation in database management systems by adopting purposes and conditional intended purposes and to achieve a fine-grained access control. The work in this paper helps enterprises to circulate a clear privacy promise, and to collect and manage user preferences and consent.

    View the Full Publication
  • 07/01/2012Extensible Context-aware Stream Processing on the CloudWalid G. Aref

    Rationale and Challenges for Massive Data Stream Processing on the CloudThe ubiquity of mobile devices, location services, and sensor pervasiveness, e.g., as in smart city initiatives, call for scalable computing platforms and massively parallel architectures to process the vast amounts of the generated streamed data. Cloud computing provides some of the features needed for these massive data streaming applications. For example, the dynamic allocation of resources on an as-needed basis addresses the variability in sensor and location data distributions over time. However, today’s cloud computing platforms lack very important features that are necessary in order to support the massive amounts of data streams envisioned by the massive and ubiquitous dissemination of sensors and mobile devices of all sorts in smart-city-scale applications.

    View the Full Publication
  • 06/01/2012Collaborative Computing: Networking, Applications and WorksharingJames Joshi, Elisa Bertino, Calton Pu, Heri Ramampiaro

    Recent advances in computing have contributed to thegrowing interconnection of our world, including 3 G/4 Gwireless networks, web 2.0 technologies, computing clouds,just to mention a few. The potential for collaboration amongvarious components has exceeded the current capabilities oftraditional approaches to system integration and interoperability.As the world heads towards unlimited connectivityand global mobile computing, collaboration becomes one ofthe fundamental challenges. We view collaborative computingas the glue that brings the components together and alsothe lubricant that make them work together. The 4th InternationalConference on Collaborative Computing: Networking,Applications and Worksharing (CollaborateCom)serves as a premier international forum for discussionamong researchers and practitioners interested in collaborativenetworking, technology and systems, and applications.

    View the Full Publication
  • 06/01/2012Data Protection from Insider ThreatsElisa Bertino

    As data represent a key asset for today's organizations, the problem of how to protect this data from theft and misuse is at the forefront of these organizations' minds. Even though today several data security techniques are available to protect data and computing infrastructures, many such techniques -- such as firewalls and network security tools -- are unable to protect data from attacks posed by those working on an organization's "inside." These "insiders" usually have authorized access to relevant information systems, making it extremely challenging to block the misuse of information while still allowing them to do their jobs. This book discusses several techniques that can provide effective protection against attacks posed by people working on the inside of an organization.

    Chapter One introduces the notion of insider threat and reports some data about data breaches due to insider threats. Chapter Two covers authentication and access control techniques, and Chapter Three shows how these general security techniques can be extended and used in the context of protection from insider threats. Chapter Four addresses anomaly detection techniques that are used to determine anomalies in data accesses by insiders. These anomalies are often indicative of potential insider data attacks and therefore play an important role in protection from these attacks.

    Security information and event management (SIEM) tools and fine-grained auditing are discussed in Chapter Five. These tools aim at collecting, analyzing, and correlating -- in real-time -- any information and event that may be relevant for the security of an organization. As such, they can be a key element in finding a solution to such undesirable insider threats. Chapter Six goes on to provide a survey of techniques for separation-of-duty (SoD). SoD is an important principle that, when implemented in systems and tools, can strengthen data protection from malicious insiders. However, to date, very few approaches have been proposed for implementing SoD in systems. In Chapter Seven, a short survey of a commercial product is presented, which provides different techniques for protection from malicious users with system privileges -- such as a DBA in database management systems. Finally, in Chapter Eight, the book concludes with a few remarks and additional research directions.

    View the Full Publication
  • 06/01/2012Cryptographic Key Management for Smart Power GridsMohamed Nabeel, John Zage, Sam Kerr, Elisa Bertino, Athula Kulatunga, U. Sudheera Navaratne, May Lou Duren

    The smart power grid promises to improve efficiency and reliability of power delivery. This report introduces the logical components, associated technologies, security protocols, and network designs of the system. Undermining the potential benefits are security threats, and those threats related to cyber security are described in this report. Concentrating on the design of the smart meter and its communication links,this report describes the ZigBee technology and implementation, and the communication between the smart meter and the collector node, with emphasis on security attributes. It was observed that many of thesecure features are based on keys that must be maintained; therefore, secure key management techniques become the basis to securing the entire grid. The descriptions of current key management techniques aredelineated, highlighting their weaknesses. Finally some initial research directions are outlined.

    View the Full Publication
  • 06/01/2012An energy-efficient approach for provenance transmission in wireless sensor networksS.M.I. Alam, Sonia Fahmy

    Assessing the trustworthiness of sensor data and transmitters of this data is critical for quality assurance. Trust evaluation frameworks utilize data provenance along with the sensed data values to compute the trustworthiness of each data item. However, in a sizeable multi-hop sensor network, provenance information requires a large and variable number of bits in each packet, resulting in high energy dissipation due to the extended period of radio communication, and making trust systems unusable. We propose energy-efficient provenance encoding and construction schemes, which we refer to as Probabilistic Provenance Flow (PPF). To the best of our knowledge, ours is the first work to make the Probabilistic Packet Marking (PPM) approach for IP traceback feasible for sensor networks. We design two bit-efficient provenance encoding schemes along with a complementary vanilla scheme. Depending on the network size and bit budget, we select the best method using mathematical approximations and numerical analysis. Our TOSSIM simulations demonstrate that the encoding schemes of PPF have identical performance with a low bit budget (~ 32-bit), requiring 33% fewer packets and 30% less energy than PPM variants to construct provenance. With a two-fold increase in bit budget, PPF with the selected encoding scheme reduces the energy consumption by 60%.

    View the Full Publication
  • 04/01/2012Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle.DM Larkin, HD Daetwyler, AG Hernandez, CL Wright, LA Hetrick, L Boucek, SL Bachman, MR Brand, TV Akraiko, M Cohen-Zinder, Jyothi Thimmapuram, IM Macleod, TT Harkin, JE McCaque, ME Goddard, BJ Hayes, HA Lewin

    Using a combination of whole-genome resequencing and high-density genotyping arrays, genome-wide haplotypes were reconstructed for two of the most important bulls in the history of the dairy cattle industry, Pawnee Farm Arlinda Chief ("Chief") and his son Walkway Chief Mark ("Mark"), each accounting for ∼7% of all current genomes. We aligned 20.5 Gbp (∼7.3× coverage) and 37.9 Gbp (∼13.5× coverage) of the Chief and Mark genomic sequences, respectively. More than 1.3 million high-quality SNPs were detected in Chief and Mark sequences. The genome-wide haplotypes inherited by Mark from Chief were reconstructed using ∼1 million informative SNPs. Comparison of a set of 15,826 SNPs that overlapped in the sequence-based and BovineSNP50 SNPs showed the accuracy of the sequence-based haplotype reconstruction to be as high as 97%. By using the BovineSNP50 genotypes, the frequencies of Chief alleles on his two haplotypes then were determined in 1,149 of his descendants, and the distribution was compared with the frequencies that would be expected assuming no selection. We identified 49 chromosomal segments in which Chief alleles showed strong evidence of selection. Candidate polymorphisms for traits that have been under selection in the dairy cattle population then were identified by referencing Chief's DNA sequence within these selected chromosome blocks. Eleven candidate genes were identified with functions related to milk-production, fertility, and disease-resistance traits. These data demonstrate that haplotype reconstruction of an ancestral proband by whole-genome resequencing in combination with high-density SNP genotyping of descendants can be used for rapid, genome-wide identification of the ancestor's alleles that have been subjected to artificial selection.

    View the Full Publication
  • 03/24/2012Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle Supporting InformationDenis M. Larkin, Hans D. Daetwyler, Alvaro G. Hernandez, Chris L. Wright, Lorie A. Hetrick, Lisa Boucek, Sharon Bachman, Jyothi Thimmapuram

    Using a combination of whole-genome resequencing and high-density genotyping arrays, genome-wide haplotypes were reconstructed for two of the most important bulls in the history of the dairy cattle industry, Pawnee Farm Arlinda Chief (“Chief”) and his son Walkway Chief Mark (“Mark”), each accounting for ∼7% of all current genomes. We aligned 20.5 Gbp (∼7.3× coverage) and 37.9 Gbp (∼13.5× coverage) of the Chief and Mark genomic sequences, respectively. More than 1.3 million high-quality SNPs were detected in Chief and Mark sequences. The genome-wide haplotypes inherited by Mark from Chief were reconstructed using ∼1 million informative SNPs. Comparison of a set of 15,826 SNPs that overlapped in the sequence-based and BovineSNP50 SNPs showed the accuracy of the sequence-based haplotype reconstruction to be as high as 97%. By using the BovineSNP50 genotypes, the frequencies of Chief alleles on his two haplotypes then were determined in 1,149 of his descendants, and the distribution was compared with the frequencies that would be expected assuming no selection. We identified 49 chromosomal segments in which Chief alleles showed strong evidence of selection. Candidate polymorphisms for traits that have been under selection in the dairy cattle population then were identified by referencing Chief’s DNA sequence within these selected chromosome blocks. Eleven candidate genes were identified with functions related to milk-production, fertility, and disease-resistance traits. These data demonstrate that haplotype reconstruction of an ancestral proband by whole-genome resequencing in combination with high-density SNP genotyping of descendants can be used for rapid, genome-wide identification of the ancestor’s alleles that have been subjected to artificial selection.

    View the Full Publication
  • 03/01/2012A Distributed Access Control Architecture for Cloud ComputingAbdulrahman Almutairi, Muhammed Sarfraz, Saleh Basalamah, Walid G. Aref, Arif Ghafoor

    The large-scale, dynamic, and heterogeneous nature of cloud computing poses numerous security challenges. But the cloud's main challenge is to provide a robust authorization mechanism that incorporates multitenancy and virtualization aspects of resources. The authors present a distributed architecture that incorporates principles from security management and software engineering and propose key requirements and a design model for the architecture.

    View the Full Publication
  • 03/01/2012Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle.DM Larkin, HD Daetwyler, AG Hernandez, CL Wright, LA Hetrick, L Boucek, SL Bachman, MR Brand, TV Akraiko, M Cohen-Zinder, Jyothi Thimmapuram, IM Macleod, TT Harkin, JE McCaque, BJ Hayes, HA Lewin

    Using a combination of whole-genome resequencing and high-density genotyping arrays, genome-wide haplotypes were reconstructed for two of the most important bulls in the history of the dairy cattle industry, Pawnee Farm Arlinda Chief ("Chief") and his son Walkway Chief Mark ("Mark"), each accounting for ∼7% of all current genomes. We aligned 20.5 Gbp (∼7.3× coverage) and 37.9 Gbp (∼13.5× coverage) of the Chief and Mark genomic sequences, respectively. More than 1.3 million high-quality SNPs were detected in Chief and Mark sequences. The genome-wide haplotypes inherited by Mark from Chief were reconstructed using ∼1 million informative SNPs. Comparison of a set of 15,826 SNPs that overlapped in the sequence-based and BovineSNP50 SNPs showed the accuracy of the sequence-based haplotype reconstruction to be as high as 97%. By using the BovineSNP50 genotypes, the frequencies of Chief alleles on his two haplotypes then were determined in 1,149 of his descendants, and the distribution was compared with the frequencies that would be expected assuming no selection. We identified 49 chromosomal segments in which Chief alleles showed strong evidence of selection. Candidate polymorphisms for traits that have been under selection in the dairy cattle population then were identified by referencing Chief's DNA sequence within these selected chromosome blocks. Eleven candidate genes were identified with functions related to milk-production, fertility, and disease-resistance traits. These data demonstrate that haplotype reconstruction of an ancestral proband by whole-genome resequencing in combination with high-density SNP genotyping of descendants can be used for rapid, genome-wide identification of the ancestor's alleles that have been subjected to artificial selection.

    View the Full Publication
  • 03/01/2012Effective query generation and postprocessing strategies for prior art patent searchSuleyman Cetintas, Luo Si

    Rapid increase in global competition demands increased protection of intellectual property rights and underlines the importance of patents as major intellectual property documents. Prior art patent search is the task of identifying related patents for a given patent file, and is an essential step in judging the validity of a patent application. This article proposes an automated query generation and postprocessing method for prior art patent search. The proposed approach first constructs structured queries by combining terms extracted from different fields of a query patent and then reranks the retrieved patents by utilizing the International Patent Classification (IPC) code similarities between the query patent and the retrieved patents along with the retrieval score. An extensive set of empirical results carried out on a large-scale, real-world dataset shows that utilizing 20 or 30 query terms extracted from all fields of an original query patent according to their log(tf)idf values helps form a representative search query out of the query patent and is found to be more effective than is using any number of query terms from any single field. It is shown that combining terms extracted from different fields of the query patent by giving higher importance to terms extracted from the abstract, claims, and description fields than to terms extracted from the title field is more effective than treating all extracted terms equally while forming the search query. Finally, utilizing the similarities between the IPC codes of the query patent and retrieved patents is shown to be beneficial to improve the effectiveness of the prior art search.

    View the Full Publication
  • 02/01/2012Special issue: best papers of VLDB 2010Paolo Atzeni, Elisa Bertino, Masaru Kitsuregawa, Kian-Lee Tan

    This special issue of the VLDB Journal is dedicated to the best papers from the 36th International Conference on Very Large Data Bases, which took place on 13–17 September 2010 in Singapore.

    The conference received 686 submissions overall. For the research tracks:

    • The Core Databse Technology Trak received 280 submissions; of these, 19 were rejected without review because of formatting violations and 48 were accepted. The acceptance rate computed with respect to the reviewed papers is 18.4%.

    • The infrastructure for Information Systems Track received 215 submissions; of these, 11 were rejected without review because of formatting violations and 33 were accepted. The acceptance rate computed with respect to the reviewed papers is 16.1%.

    • The experiemental and Analysis Track received 15 submissions and accepted 4.

    View the Full Publication
  • 01/23/2012Efficient Leakage-free Authentication of Trees, Graphs and ForestsAshish Kundu, Mikhail J. Atallah, Elisa Bertino

    Leakage-free authentication of trees and graphs have been studied in the literature. Such schemes have several practical applications especially in the cloud computing area. In this paper, we propose an authentication scheme that computes only one signature (optimal). Our scheme is not only super-efficient in the number of signatures it computes and in its runtime, but also is highly versatile -- it can be applied not only to trees, but also to graphs and forests (disconnected trees and graphs). While achieving such efficiency and versatility, we must also mention that our scheme achieves the desired security -- leakage-free authentication of data objects represented as trees, graphs and forests. This is achieved by another novel scheme that we have proposed in this paper -- a secure naming scheme for nodes of such data structures. Such a scheme assigns "secure names" to nodes such that these secure names can be used to verify the order between the nodes efficiently without leaking information about other nodes. As far as we know, our scheme is the first such scheme in literature that is optimal in its efficiency, supports two important security concerns -- authenticity and leakage-free (privacy-preserving/confidentiality), and is versatile in its applicability as it is to trees, graphs as well as forests. We have carried out complexity as well as experimental analysis of this scheme that corroborates its performance.

    View the Full Publication
  • 01/01/2012A Flexible Approach to Multisession Trust NegotiationsAnna Squicciarini, Elisa Bertino, Alberto Trombetta, Stefano Braghin

    Trust Negotiation has shown to be a successful, policy-driven approach for automated trust establishment, through the release of digital credentials. Current real applications require new flexible approaches to trust negotiations, especially in light of the widespread use of mobile devices. In this paper, we present a multisession dependable approach to trust negotiations. The proposed framework supports voluntary and unpredicted interruptions, enabling the negotiating parties to complete the negotiation despite temporary unavailability of resources. Our protocols address issues related to validity, temporary loss of data, and extended unavailability of one of the two negotiators. A peer is able to suspend an ongoing negotiation and resume it with another (authenticated) peer. Negotiation portions and intermediate states can be safely and privately passed among peers, to guarantee the stability needed to continue suspended negotiations. We present a detailed analysis showing that our protocols have several key properties, including validity, correctness, and minimality. Also, we show how our negotiation protocol can withstand the most significant attacks. As by our complexity analysis, the introduction of the suspension and recovery procedures, and mobile negotiations does not significantly increase the complexity of ordinary negotiations. Our protocols require a constant number of messages whose size linearly depend on the portion of trust negotiation that has been carried before the suspensions.

    View the Full Publication
  • 01/01/2012On practical specification and enforcement of obligationsNinghui Li, Haining Chen, Elisa Bertino

    Obligations are an important and indispensable part of many access control policies, such as those in DRM (Digital Rights Management) and healthcare information systems. To be able use obligations in a real-world access control system, there must exist a language for specifying obligations. However, such a language is currently lacking. XACML (eXtensible Access Control Markup Language), the current de facto standard for specifying access control policies, seems to integrate obligations as a part of it, but it treats obligations largely as black boxes, without specifying what an obligation should include and how to handle them. In this paper we examine the challenges in designing a practical approach for specifying and handling obligations, and then propose a language for specifying obligations, and an architecture for handling access control policies with these obligations, extending XACML's specification and architecture. In our design, obligations are modeled as state machines which communicate with the access control system and the outside world via events. We further implement our design into a prototype system named ExtXACML, based on SUN's XACML implementation. ExtXACML is extensible in that new obligation modules can be added into the system to handle various obligations for different applications, which shows the strong power of our design.

    View the Full Publication
  • 01/01/2012Leakage-free redactable signaturesAshish Kundu, Mikhail J. Atallah, Elisa Bertino

    Redactable signatures for linear-structured data such as strings have already been studied in the literature. In this paper, we propose a formal security model for leakage-free redactable signatures (LFRS) that is general enough to address authentication of not only trees but also graphs and forests. LFRS schemes have several applications, especially in enabling secure data management in the emerging cloud computing paradigm as well as in healthcare, finance and biological applications. We have also formally defined the notion of secure names. Such secure names facilitate leakage-free verification of ordering between siblings/nodes. The paper also proposes a construction for secure names, and a construction for leakagefree redactable signatures based on the secure naming scheme. The proposed construction computes a linear number of signatures with respect to the size of the data object, and outputs only one signature that is stored, transmitted and used for authentication of any tree, graph and forest.

    View the Full Publication
  • 01/01/2012A hybrid approach of OpenMP for clustersOkwan Kwon, Fahed Jubair, Rudolf Eigenmann, Samuel Midkiff

    We present the first fully automated compiler-runtime system that successfully translates and executes OpenMP shared-address-space programs on laboratory-size clusters, for the complete set of regular, repetitive applications in the NAS Parallel Benchmarks. We introduce a hybrid compiler-runtime translation scheme. Compared to previous work, this scheme features a new runtime data flow analysis and new compiler techniques for improving data affinity and reducing communication costs. We present and discuss the performance of our translated programs, and compare them with the performance of the MPI, HPF and UPC versions of the benchmarks. The results show that our translated programs achieve 75% of the hand-coded MPI programs, on average.

    View the Full Publication
  • 01/01/2012Instance-specific multi-objective parameter tuning based on fuzzy logicJana Ries, Patrick Beullens, David Salt

    Finding good parameter values for meta-heuristics is known as the parameter setting problem. A new parameter tuning strategy, called IPTS, is proposed that is a novel instance-specific method to take the trade-off between solution quality and computational time into consideration. Two important steps in the method are an a priori statistical analysis to identify the factors that determine heuristic performance in both quality and time for a specific type of problem, and the transformation of these insights into a fuzzy inference system rule base which aims to return parameter values on the Pareto-front with respect to a decision maker’s preference.

    Applied to the symmetric Travelling Salesman Problem and the meta-heuristic Guided Local Search, the approach is consistently faster than a traditional non-instance-specific parameter tuning strategy without significantly affecting solution quality; optimised for speed, computational times are shown to be on average 20 times faster while producing solutions of similar quality. A number of interesting areas for further research are discussed.

    View the Full Publication
  • 01/01/2012A framework for verification and optimal reconfiguration of event-driven role based access control policiesBasit Shafiq, Jaideep Vaidya, Arif Ghafoor, Elisa Bertino

    Role based access control (RBAC) is the de facto model used for advanced access control due to its inherent richness and flexibility. Despite its great success at modeling a variety of organizational needs, maintaining large complex policies is a challenging problem. Conflicts within policies can expose the underlying system to numerous vulnerabilities and security risks. Therefore, more comprehensive verification tools for RBAC need to be developed to enable effective access control. In this paper, we propose a verification framework for detection and resolution of inconsistencies and conflicts in policies modeled through event-driven RBAC, an important subset of generalized temporal RBAC applicable to many domains, such as SCADA systems. We define the conflict resolution problem and propose an integer programming based heuristic. The proposed approach is generic and can be tuned to a variety of optimality measures.

    View the Full Publication
  • 01/01/2012Efficient privacy preserving content based publish subscribe systemsMohamed Nabeel, Ning Shang, Elisa Bertino

    The ability to seamlessly scale on demand has made Content-Based Publish-Subscribe (CBPS) systems the choice of distributing messages/documents produced by Content Publishers to many Subscribers through Content Brokers. Most of the current systems assume that Content Brokers are trusted for the confidentiality of the data published by Content Publishers and the privacy of the subscriptions, which specify their interests, made by Subscribers. However, with the increased use of technologies, such as service oriented architectures and cloud computing, essentially outsourcing the broker functionality to third-party providers, one can no longer assume the trust relationship to hold. The problem of providing privacy/confidentiality in CBPS systems is challenging, since the solution to the problem should allow Content Brokers to make routing decisions based on the content without revealing the content to them. The previous work attempted to solve this problem was not fully successful. The problem may appear unsolvable since it involves conflicting goals, but in this paper, we propose a novel approach to preserve the privacy of the subscriptions made by Subscribers and confidentiality of the data published by Content Publishers using cryptographic techniques when third-party Content Brokers are utilized to make routing decisions based on the content. Our protocols are expressive to support any type of subscriptions and designed to work efficiently. We distribute the work such that the load on Content Brokers, where the bottleneck is in a CBPS system, is minimized. We extend a popular CBPS system using our protocols to implement a privacy preserving CBPS system.

    View the Full Publication
  • 01/01/2012Emerging trends around big data analytics and security: panelRafae Bhatti, Ryan LaSalle, Rob Bird, Tim Grance, Elisa Bertino

    This panel will discuss the interplay between key emerging security trends centered around big data analytics and security. With the explosion of big data and advent of cloud computing, data analytics has not only become prevalent but also a critical business need. Internet applications today consume vast amounts of data collected from heterogeneous big data repositories and provide meaningful insights from it. These include applications for business forecasting, investment and finance, healthcare and well-being, science and hi-tech, to name a few. Security and operational intelligence is one of the critical areas where big data analytics is expected to play a crucial role. Security analytics in a big data environment presents a unique set of challenges, not properly addressed by the existing security incident and event monitoring (or SIEM) systems that typically work with a limited set of traditional data sources (firewall, IDS, etc.) in an enterprise network. A big data environment presents both a great opportunity and a challenge due to the explosion and heterogeneity of the potential data sources that extend the boundary of analytics to social networks, real time streams and other forms of highly contextual data that is characterized by high volume and speed. In addition to meeting infrastructure challenges, there remain additional unaddressed issues, including but not limited to development of self-evolving threat ontologies, integrated network and application layer analytics, and detection of "low and slow" attacks. At the same time, security analytics requires a high degree of data assurance, where assurance implies that the data be trustworthy as well as managed in a privacy preserving manner. Our panelists represent individuals from industry, academia, and government who are at the forefront of big data security analytics. They will provide insights into these unique challenges, survey the emerging trends, and lay out a vision for future.

    View the Full Publication
  • 01/01/2012Collaborative Computing: Networking, Applications and WorksharingJames Joshi, Elisa Bertino, Calton Pu, Heri Ramampiaro

    Recent advances in computing have contributed to thegrowing interconnection of our world, including 3 G/4 Gwireless networks, web 2.0 technologies, computing clouds,just to mention a few. The potential for collaboration amongvarious components has exceeded the current capabilities oftraditional approaches to system integration and interoperability.As the world heads towards unlimited connectivityand global mobile computing, collaboration becomes one ofthe fundamental challenges. We view collaborative computingas the glue that brings the components together and alsothe lubricant that make them work together. The 4th InternationalConference on Collaborative Computing: Networking,Applications and Worksharing (CollaborateCom)serves as a premier international forum for discussionamong researchers and practitioners interested in collaborativenetworking, technology and systems, and applications.

    View the Full Publication
  • 01/01/2012A Comprehensive Model for ProvenanceSalmin Sultana, Elisa Bertino

    In this paper, we propose a provenance model able to represent the provenance of any data object captured at any abstraction layer (workflow/process/OS) and present an abstract schema of the model. The expressive nature of the model makes it potential to be utilized in real world data processing systems.

    View the Full Publication
  • 01/01/2012A Hybrid Approach to Private Record MatchingAli Inan, Murat Kantarcioglu, Gabriel Ghinita, Elisa Bertino

    Real-world entities are not always represented by the same set of features in different data sets. Therefore, matching records of the same real-world entity distributed across these data sets is a challenging task. If the data sets contain private information, the problem becomes even more difficult. Existing solutions to this problem generally follow two approaches: sanitization techniques and cryptographic techniques. We propose a hybrid technique that combines these two approaches and enables users to trade off between privacy, accuracy, and cost. Our main contribution is the use of a blocking phase that operates over sanitized data to filter out in a privacy-preserving manner pairs of records that do not satisfy the matching condition. We also provide a formal definition of privacy and prove that the participants of our protocols learn nothing other than their share of the result and what can be inferred from their share of the result, their input and sanitized views of the input data sets (which are considered public information). Our method incurs considerably lower costs than cryptographic techniques and yields significantly more accurate matching results compared to sanitization techniques, even when privacy requirements are high.

    View the Full Publication
  • 01/01/2012Privacy-Preserving Enforcement of Spatially Aware RBACMichael Kirkpatrick, Gabriel Ghinita, Elisa Bertino

    Several models for incorporating spatial constraints into role-based access control (RBAC) have been proposed, and researchers are now focusing on the challenge of ensuring such policies are enforced correctly. However, existing approaches have a major shortcoming, as they assume the server is trustworthy and require complete disclosure of sensitive location information by the user. In this work, we propose a novel framework and a set of protocols to solve this problem. Specifically, in our scheme, a user provides a service provider with role and location tokens along with a request. The service provider consults with a role authority and a location authority to verify the tokens and evaluate the policy. However, none of the servers learn the requesting user's identity, role, or location. In this paper, we define the protocols and the policy enforcement scheme, and present a formal proof of a number of security properties.

    View the Full Publication
  • 01/01/2012Resilient Authenticated Execution of Critical Applications in Untrusted EnvironmentsMichael Kirkpatrick, Gabriel Ghinita, Elisa Bertino

    Modern computer systems are built on a foundation of software components from a variety of vendors. While critical applications may undergo extensive testing and evaluation procedures, the heterogeneity of software sources threatens the integrity of the execution environment for these trusted programs. For instance, if an attacker can combine an application exploit with a privilege escalation vulnerability, the operating system (OS) can become corrupted. Alternatively, a malicious or faulty device driver running with kernel privileges could threaten the application. While the importance of ensuring application integrity has been studied in prior work, proposed solutions immediately terminate the application once corruption is detected. Although, this approach is sufficient for some cases, it is undesirable for many critical applications. In order to overcome this shortcoming, we have explored techniques for leveraging a trusted virtual machine monitor (VMM) to observe the application and potentially repair damage that occurs. In this paper, we describe our system design, which leverages efficient coding and authentication schemes, and we present the details of our prototype implementation to quantify the overhead of our approach. Our work shows that it is feasible to build a resilient execution environment, even in the presence of a corrupted OS kernel, with a reasonable amount of storage and performance overhead.

    View the Full Publication
  • 01/01/2012A Game-Theoretic Approach for High-Assurance of Data Trustworthiness in Sensor NetworksHyo-Sang Lim, Gabriel Ghinita, Elisa Bertino, Murat Kantarcioglu

    Sensor networks are being increasingly deployed in many application domains ranging from environment monitoring to supervising critical infrastructure systems (e.g., the power grid). Due to their ability to continuously collect large amounts of data, sensor networks represent a key component in decision-making, enabling timely situation assessment and response. However, sensors deployed in hostile environments may be subject to attacks by adversaries who intend to inject false data into the system. In this context, {\em data trustworthiness} is an important concern, as false readings may result in wrong decisions with serious consequences (e.g., large-scale power outages). To defend against this threat, it is important to establish trust levels for sensor nodes and adjust node trustworthiness scores to account for malicious interferences. In this paper, we develop a game-theoretic defense strategy to protect sensor nodes from attacks and to guarantee a high level of trustworthiness for sensed data. We use a discrete time model, and we consider that there is a limited attack budget that bounds the capability of the attacker in each round. The defense strategy objective is to ensure that sufficient sensor nodes are protected in each round such that the discrepancy between the value accepted and the truthful sensed value is below a certain threshold. We model the attack-defense interaction as a Stackel berg game, and we derive the Nash equilibrium condition that is sufficient to ensure that the sensed data are truthful within a nominal error bound. We implement a prototype of the proposed strategy and we show through extensive experiments that our solution provides an effective and efficient way of protecting sensor networks from attacks.

    View the Full Publication
  • 01/01/2012Privacy-Preserving and Content-Protecting Location Based QueriesRussell Paulet, Md. Golam Koasar, Xun Yi, Elisa Bertino

    In this paper we present a solution to one of the location-based query problems. This problem is defined as follows: (i) a user wants to query a database of location data, known as Points Of Interest (POI), and does not want to reveal his/her location to the server due to privacy concerns, (ii) the owner of the location data, that is, the location server, does not want to simply distribute its data to all users. The location server desires to have some control over its data, since the data is its asset. Previous solutions have used a trusted anonymiser to address privacy, but introduced the impracticality of trusting a third party. More recent solutions have used homomorphic encryption to remove this weakness. Briefly, the user submits his/her encrypted coordinates to the server and the server would determine the user's location homomorphically, and then the user would acquire the corresponding record using Private Information Retrieval techniques. We propose a major enhancement upon this result by introducing a similar two stage approach, where the homomorphic comparison step is replaced with Oblivious Transfer to achieve a more secure solution for both parties. The solution we present is efficient and practical in many scenarios. We also include the results of a working prototype to illustrate the efficiency of our protocol.

    View the Full Publication
  • 01/01/2012A Comprehensive Model for ProvenanceSalmin Sultana, Elisa Bertino

    In this paper, we propose a provenance model able to represent the provenance of any data object captured at any abstraction layer and present an abstract schema of the model. The expressive nature of the model enables a wide range of provenance queries. We also illustrate the utility of our model in real world data processing systems.

    View the Full Publication
  • 01/01/2012Towards a theory for privacy preserving distributed OLAPAlfredo Cuzzocrea, Elisa Bertino, Domenico Sacca

    Privacy Preserving Distributed OLAP identifies a collection of models, methodologies and algorithms devoted to ensuring the privacy of multidimensional OLAP data cubes in distributed environments. While there is noticeable research on practical and pragmatic aspects of Privacy Preserving OLAP, both in centralized and distributed environments, the active literature is lacking of contributions falling in the theory-side of this emerging research topic. Contrary to this, according to our vision, there is a significant need for theoretical results, which may involve in benefits for a wide spectrum of aspects, such as privacy preserving knowledge fruition schemes and query optimization. Inspired by these considerations, starting from our previous research result where the main privacy preserving distributed OLAP framework has been introduced, this paper proposes some theoretical results that nicely extend the capabilities and the potentialities of the framework above.

    View the Full Publication
  • 01/01/2012Leakage-free redactable signaturesAshish Kundu, Mikhail J. Atallah, Elisa Bertino

    Redactable signatures for linear-structured data such as strings have already been studied in the literature. In this paper, we propose a formal security model for leakage-free redactable signatures (LFRS) that is general enough to address authentication of not only trees but also graphs and forests. LFRS schemes have several applications, especially in enabling secure data management in the emerging cloud computing paradigm as well as in healthcare, finance and biological applications. We have also formally defined the notion of secure names. Such secure names facilitate leakage-free verification of ordering between siblings/nodes. The paper also proposes a construction for secure names, and a construction for leakagefree redactable signatures based on the secure naming scheme. The proposed construction computes a linear number of signatures with respect to the size of the data object, and outputs only one signature that is stored, transmitted and used for authentication of any tree, graph and forest.

    View the Full Publication
  • 01/01/2012On practical specification and enforcement of obligationsNinghui Li, Haining Chen, Elisa Bertino

    Obligations are an important and indispensable part of many access control policies, such as those in DRM (Digital Rights Management) and healthcare information systems. To be able use obligations in a real-world access control system, there must exist a language for specifying obligations. However, such a language is currently lacking. XACML (eXtensible Access Control Markup Language), the current de facto standard for specifying access control policies, seems to integrate obligations as a part of it, but it treats obligations largely as black boxes, without specifying what an obligation should include and how to handle them. In this paper we examine the challenges in designing a practical approach for specifying and handling obligations, and then propose a language for specifying obligations, and an architecture for handling access control policies with these obligations, extending XACML's specification and architecture. In our design, obligations are modeled as state machines which communicate with the access control system and the outside world via events. We further implement our design into a prototype system named ExtXACML, based on SUN's XACML implementation. ExtXACML is extensible in that new obligation modules can be added into the system to handle various obligations for different applications, which shows the strong power of our design.

    View the Full Publication
  • 01/01/2012Demonstrating a lightweight data provenance for sensor networksBilal Shebaro, Salmin Sultana, Shakthidhar Gopavaram, Elisa Bertino

    The popularity of sensor networks and their many uses in critical domains such as military and healthcare make them more vulnerable to malicious attacks. In such contexts, trustworthiness of sensor data and their provenance is critical for decision-making. In this demonstration, we present an efficient and secure approach for transmitting provenance information about sensor data. Our provenance approach uses light-weight in-packet Bloom filters that are encoded as sensor data travels through intermediate sensor nodes, and are decoded and verified at the base station. Our provenance technique is also able to defend against malicious attacks such as packet dropping and allows one to detect the responsible node for packet drops. As such it makes possible to modify the transmission route to avoid nodes that could be compromised or malfunctioning. Our technique is designed to create a trustworthy environment for sensor nodes where only trusted data is processed.

    View the Full Publication
  • 01/01/2012Marlin: making it harder to fish for gadgetsAditi Gupta, Sam Kerr, Michael Kirkpatrick, Elisa Bertino

    Code-reuse attacks, including return-oriented programming (ROP) and jump-oriented programming, bypass defenses against code injection by repurposing existing executable code in application binaries and shared libraries toward a malicious end. A common feature of these attacks is the reliance on the knowledge of the layout of the executable code. We propose a fine grained randomization based approach that modifies the layout of executable code and hinders code-reuse attack. Our solution consists solely of a modified dynamic loader that randomizes the internal structure of the executable code, thereby denying the attacker the necessary apriori knowledge for constructing the desired sequence of gadgets. Our approach has the advantage that it can be applied to any ELF binary and every execution of this binary uses a different randomization. We describe the initial implementation of Marlin, a customized loader for randomization of executable code. Our work shows that such an approach is feasible and significantly increases the level of security against code-reuse attacks.

    View the Full Publication
  • 01/01/2012Mining contrastive opinions on political texts using cross-perspective topic modelYi Fang, Luo Si, Naveen Somasundaram, Zhengtao Yu

    This paper presents a novel opinion mining research problem, which is called Contrastive Opinion Modeling (COM). Given any query topic and a set of text collections from multiple perspectives, the task of COM is to present the opinions of the individual perspectives on the topic, and furthermore to quantify their difference. This general problem subsumes many interesting applications, including opinion summarization and forecasting, government intelligence and cross-cultural studies. We propose a novel unsupervised topic model for contrastive opinion modeling. It simulates the generative process of how opinion words occur in the documents of different collections. The ad hoc opinion search process can be efficiently accomplished based on the learned parameters in the model. The difference of perspectives can be quantified in a principled way by the Jensen-Shannon divergence among the individual topic-opinion distributions. An extensive set of experiments have been conducted to evaluate the proposed model on two datasets in the political domain: 1) statement records of U.S. senators; 2) world news reports from three representative media in U.S., China and India, respectively. The experimental results with both qualitative and quantitative analysis have shown the effectiveness of the proposed model.

    View the Full Publication
  • 01/01/2012Emotion tagging for comments of online news by meta classification with heterogeneous information sourcesYing Zhang, Yi Fang, Xiaojun Quan, Lin Dai, Luo Si, Xiaojie Yuan

    With the rapid growth of online news services, users can actively respond to online news by making comments. Users often express subjective emotions in comments such as sadness, surprise and anger. Such emotions can help understand the preferences and perspectives of individual users, and therefore may facilitate online publishers to provide users with more relevant services. This paper tackles the task of predicting emotions for the comments of online news. To the best of our knowledge, this is the first research work for addressing the task. In particular, this paper proposes a novel Meta classification approach that exploits heterogeneous information sources such as the content of the comments and the emotion tags of news articles generated by users. The experiments on two datasets from online news services demonstrate the effectiveness of the proposed approach.

    View the Full Publication
  • 01/01/2012Mixture model with multiple centralized retrieval algorithms for result merging in federated searchDzung Hong, Luo Si

    Result merging is an important research problem in federated search for merging documents retrieved from multiple ranked lists of selected information sources into a single list. The state-of-the-art result merging algorithms such as Semi-Supervised Learning (SSL) and Sample-Agglomerate Fitting Estimate (SAFE) try to map document scores retrieved from different sources to comparable scores according to a single centralized retrieval algorithm for ranking those documents. Both SSL and SAFE arbitrarily select a single centralized retrieval algorithm for generating comparable document scores, which is problematic in a heterogeneous federated search environment, since a single centralized algorithm is often suboptimal for different information sources. Based on this observation, this paper proposes a novel approach for result merging by utilizing multiple centralized retrieval algorithms. One simple approach is to learn a set of combination weights for multiple centralized retrieval algorithms (e.g., logistic regression) to compute comparable document scores. The paper shows that this simple approach generates suboptimal results as it is not flexible enough to deal with heterogeneous information sources. A mixture probabilistic model is thus proposed to learn more appropriate combination weights with respect to different types of information sources with some training data. An extensive set of experiments on three datasets have proven the effectiveness of the proposed new approach.

    View the Full Publication
  • 01/01/2012Sentiment detection with auxiliary dataDan Zhang, Luo Si, Vernon Rego

    As an important application in text mining and social media, sentiment detection has aroused more and more research interests, due to the expanding volume of available online information such as microblogging messages and review comments. Many machine learning methods have been proposed for sentiment detection. As a branch of machine learning, transfer learning is an important technique that tries to transfer knowledge from one domain to another one. When applied to sentiment detection, existing transfer learning methods employ articles with human labeled sentiments from other domains to help the sentiment detection on a target domain. Although most existing transfer learning methods are devoted to handle the data distribution difference between different domains, they only resort to some approximation methods, which may introduce some unnecessary biases. Furthermore, the popular assumption of existing transfer learning techniques on conditional probability is often too strong for practical applications. In this paper, we propose a novel method to model the distribution difference between different domains in sentiment detection by directly modeling the underlying joint distributions for different domains. Some of the important properties of the proposed method, such as the convergence rate and time complexity, are analyzed. The experimental results on the product review dataset and the twitter dataset demonstrate the advantages of the proposed method over the state-of-the-art methods.

    View the Full Publication
  • 01/01/2012Initial results of using an intelligent tutoring system with AliceStephen Cooper, Yoon Jae Nam, Luo Si

    This paper describes the initial steps taken towards incorporating an intelligent tutoring system (ITS) into Alice. After initially describing an ITS, the paper focuses on the development of several tutorials for teaching specific introductory programming concepts that have been created using stencils. Initial results concerning usability and effectiveness of these stencil-based tutorials are provided.

    View the Full Publication
  • 01/01/2012Expertise RetrievalKrisztian Balog, Yi Fang, Maarten de Rijke, Pavel Serdyukov, Luo Si

    People have looked for experts since before the advent of computers. With advances in information retrieval technology, coupled with the large-scale availability of traces of knowledge-related activities, computer systems that can fully automate the process of locating expertise have become a reality. The past decade has witnessed tremendous interest and a wealth of results in expertise retrieval as an emerging subdiscipline in information retrieval. This survey highlights advances in models and algorithms relevant to this field. We draw connections among methods proposed in the literature and summarize them in five groups of basic approaches. These serve as the building blocks for more advanced models that arise when we consider a range of content-based factors that may impact the strength of association between a topic and a person. We also discuss practical aspects of building an expert search system and present applications of the technology in other domains such as blog distillation and entity retrieval. The limitations of current approaches are also pointed out. We end our survey with a set of conjectures on what the future may hold for expertise retrieval research.

    View the Full Publication
  • 01/01/2012A Discriminative Data-Dependent Mixture-Model Approach for Multiple Instance Learning in Image ClassificationQifan Wang, Luo Si, Dan Zhang

    Multiple Instance Learning (MIL) has been widely used in various applications including image classification. However, existing MIL methods do not explicitly address the multi-target problem where the distributions of positive instances are likely to be multi-modal. This strongly limits the performance of multiple instance learning in many real world applications. To address this problem, this paper proposes a novel discriminative data-dependent mixture-model method for multiple instance learning (MM-MIL) approach in image classification. The new method explicitly handles the multi-target problem by introducing a data-dependent mixture model, which allows positive instances to come from different clusters in a flexible manner. Furthermore, the kernelized representation of the proposed model allows effective and efficient learning in high dimensional feature space. An extensive set of experimental results demonstrate that the proposed new MM-MIL approach substantially outperforms several state-of-art MIL algorithms on benchmark datasets.

    View the Full Publication
  • 01/01/2012Robust Nonnegative Matrix Factorization via $L_1$ Norm RegularizationBin Shen, Luo Si, Rong Ji, Baodi Liu

    Nonnegative Matrix Factorization (NMF) is a widely used technique in many applications such as face recognition, motion segmentation, etc. It approximates the nonnegative data in an original high dimensional space with a linear representation in a low dimensional space by using the product of two nonnegative matrices. In many applications data are often partially corrupted with large additive noise. When the positions of noise are known, some existing variants of NMF can be applied by treating these corrupted entries as missing values. However, the positions are often unknown in many real world applications, which prevents the usage of traditional NMF or other existing variants of NMF. This paper proposes a Robust Nonnegative Matrix Factorization (RobustNMF) algorithm that explicitly models the partial corruption as large additive noise without requiring the information of positions of noise. In practice, large additive noise can be used to model outliers. In particular, the proposed method jointly approximates the clean data matrix with the product of two nonnegative matrices and estimates the positions and values of outliers/noise. An efficient iterative optimization algorithm with a solid theoretical justification has been proposed to learn the desired matrix factorization. Experimental results demonstrate the advantages of the proposed algorithm.

    View the Full Publication
  • 01/01/2012A Bayesian Approach toward Active Learning for Collaborative FilteringRong Jin, Luo Si

    Collaborative filtering is a useful technique for exploiting the preference patterns of a group of users to predict the utility of items for the active user. In general, the performance of collaborative filtering depends on the number of rated examples given by the active user. The more the number of rated examples given by the active user, the more accurate the predicted ratings will be. Active learning provides an effective way to acquire the most informative rated examples from active users. Previous work on active learning for collaborative filtering only considers the expected loss function based on the estimated model, which can be misleading when the estimated model is inaccurate. This paper takes one step further by taking into account of the posterior distribution of the estimated model, which results in more robust active learning algorithm. Empirical studies with datasets of movie ratings show that when the number of ratings from the active user is restricted to be small, active learning methods only based on the estimated model don't perform well while the active learning method using the model distribution achieves substantially better performance.

    View the Full Publication
  • 01/01/2012A latent pairwise preference learning approach for recommendation from implicit feedbackYi Fang, Luo Si

    Most of the current recommender systems heavily rely on explicit user feedback such as ratings on items to model users' interests. However, in many applications, it is very hard to collect the explicit feedback, while implicit feedback such as user clicks may be more available. Furthermore, it is often more suitable for many recommender systems to address a ranking problem than a rating predicting problem. This paper proposes a latent pairwise preference learning (LPPL) approach for recommendation with implicit feedback. LPPL directly models user preferences with respect to a set of items rather than the rating scores on individual items, which are modeled with a set of features by analyzing clickthrough data available in many real-world recommender systems. The LPPL approach models both the latent variables of group structure of users and the pairwise preferences simultaneously. We conduct experiments on the testbed from a real-world recommender system and demonstrate that the proposed approach can effectively improve the recommendation performance against several baseline algorithms.

    View the Full Publication
  • 01/01/2012Reliable, Flexible Cloud Computing, Storage, Backup, Disaster RecoveryAbdulrahman Almutairi, Muhammed Sarfraz, Saleh Basalamah, Walid G. Aref, Arif Ghafoor

    The large-scale, dynamic, and heterogeneous nature of cloud computing poses numerous security challenges. But the cloud's main challenge is to provide a robust authorization mechanism that incorporates multitenancy and virtualization aspects of resources. The authors present a distributed architecture that incorporates principles from security management and software engineering and propose key requirements and a design model for the architecture.

    View the Full Publication
  • 01/01/2012Spatial Queries with Two kNN PredicatesAhmed Aly, Walid G. Aref, Mourad Ouzzani

    The widespread use of location-aware devices has led to countless location-based services in which a user query can be arbitrarily complex, i.e., one that embeds multiple spatial selection and join predicates. Amongst these predicates, the k-Nearest-Neighbor (kNN) predicate stands as one of the most important and widely used predicates. Unlike related research, this paper goes beyond the optimization of queries with single kNN predicates, and shows how queries with two kNN predicates can be optimized. In particular, the paper addresses the optimization of queries with: (i) two kNN-select predicates, (ii) two kNN-join predicates, and (iii) one kNN-join predicate and one kNN-select predicate. For each type of queries, conceptually correct query evaluation plans (QEPs) and new algorithms that optimize the query execution time are presented. Experimental results demonstrate that the proposed algorithms outperform the conceptually correct QEPs by orders of magnitude.

    View the Full Publication
  • 01/01/2012Deep Web Query Interface Understanding and IntegrationEduard Dragut, Weiyi Meng, Clement Yu

    There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art techniques for extracting, understanding, and integrating the query interfaces of deep Web data sources. These techniques are critical for producing an integrated query interface for each domain. The interface serves as the mediator for searching all data sources in the concerned domain. While query interface integration is only relevant for the deep Web integration approach, the extraction and understanding of query interfaces are critical for both deep Web exploration approaches.

    This book aims to provide in-depth and comprehensive coverage of the key technologies needed to create high quality integrated query interfaces automatically. The following technical issues are discussed in detail in this book: query interface modeling, query interface extraction, query interface clustering, query interface matching, query interface attribute integration, and query interface integration

    View the Full Publication
  • 01/01/2012Ionomics Atlas: a tool to explore interconnected ionomic, genomic and environmental dataEduard Dragut, Mourad Ouzzani, Amgad Madkour, Mohamed Nabeel, Peter Baker, David Salt

    Ionomics Atlas facilitates access, analysis and interpretation of an existing large-scale heterogeneous dataset consisting of ionomic (elemental composition of an organism), genetic (heritable changes in the DNA of an organism) and geographic information (geographic location, altitude, climate, soil properties, etc). Ionomics Atlas allows connections to be made between the genetic regulation of the ionome of plant populations and their landscape distribution, allowing scientists to investigate the role of natural ionomic variation in adaptation of populations to varied environmental conditions in the landscape. The goal of the Ionomics Atlas is twofold: (1) to allow both novice and expert users to easily access and explore layers of interconnected ionomic, genomic and environmental data; and (2) to facilitate hypothesis generation and testing by proving direct querying and browsing of the data as well as different display modes of the results.

    View the Full Publication
  • 01/01/2012Polarity Consistency Checking for Sentiment DictionariesEduard Dragut, Hong Wang, Clement Yu, Prasad Sistla, Weiyi Meng

    Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries ave been manually or (semi)automatically constructed. The dictionaries have substantial inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability problem and utilize a fast SAT solver to detect inconsistencies in a sentiment dictionary. We perform experiments on four sentiment dictionaries and WordNet.

    View the Full Publication
  • 01/01/2012Topic 11: Multicore and Manycore ProgrammingEduard Ayguade, Dionisios Pnevmatikatos, Rudolf Eigenmann, Mikel Lujan, Sabri Pllana

    Modern multicore and manycore systems enjoy the benefits of technology scaling and promise impressive performance. However, harvesting this potential is not straightforward. While multicore and manycore processors alleviate several problems that are related to single-core processors – known as memory-, power-, or instruction-level parallelism-wall – they raise the issue of the programmability and programming effort. This topic focuses on novel solutions for multicore and manycore programmability and efficient programming in the context of generalpurpose systems.

    View the Full Publication
  • 01/01/2012Spatial Queries with Two kNN PredicatesAhmed Aly, Mourad Ouzzani, Walid Aref

    The widespread use of location-aware devices has led to countless location-based services in which a user query can be arbitrarily complex, i.e., one that embeds multiple spatial selection and join predicates. Amongst these predicates, the k-Nearest-Neighbor (kNN) predicate stands as one of the most important and widely used predicates. Unlike related research, this paper goes beyond the optimization of queries with single kNN predicates, and shows how queries with two kNN predicates can be optimized. In particular, the paper addresses the optimization of queries with: (i) two kNN-select predicates, (ii) two kNN-join predicates, and (iii) one kNN-join predicate and one kNN-select predicate. For each type of queries, conceptually correct query evaluation plans (QEPs) and new algorithms that optimize the query execution time are presented. Experimental results demonstrate that the proposed algorithms outperform the conceptually correct QEPs by orders of magnitude.

    View the Full Publication
  • 01/01/2012Design and evaluation of the S3 monitor network measurement service on GENIEthan Blanton, S Chatterjee, S Gangam, S Kala, D Sharma, Sonia Fahmy, P Sharma

    Network monitoring capabilities are critical for both network operators and networked applications. In the context of an experimental test facility, network measurement is important for researchers experimenting with new network architectures and applications, as well as operators of the test facility itself. The Global Environment for Network Innovations (GENI) is a sophisticated test facility comprised of multiple “control frameworks.” In this paper, we describe the design and implementation of S3 Monitor, a scalable and extensible monitoring service for GENI. A key feature of S3 Monitor is a flexible design that allows easy “plug in” of new network measurement tools. We discuss our deployment experiences with S3 Monitor on GENI, and give experimental results to quantify the performance and system footprint of S3 Monitor. We find that the S3 Monitor service is light-weight and scales well as the number of paths to be monitored increases.

    View the Full Publication
  • 01/01/2012Detecting unsafe BGP policies in a flexible worldDebbie Perouli, Timmothy Griffin, Olaf Maennel, Sonia Fahmy, Cristel Pelsser, Alexander Gurney, Iain Phillips

    Internet Service Providers (ISPs) need to balance multiple opposing objectives. On one hand, they strive to offer innovative services to obtain competitive advantages; on the other, they have to interconnect with potentially competing ISPs to achieve reachability, and coordinate with them for certain services. The complexity of balancing these objectives is reflected in the diversity of policies of the Border Gateway Protocol (BGP), the standard inter-domain routing protocol. Unforeseen interactions among the BGP policies of different ISPs can cause routing anomalies. In this work, we propose a methodology to allow ISPs to check their BGP policy configurations for guaranteed convergence to a single stable state. This requires that a set of ISPs share their configurations with each other, or with a trusted third party. Compared to previous approaches to BGP safety, we (1) allow ISPs to use a richer set of policies, (2) do not modify the BGP protocol itself, and (3) detect not only instability, but also multiple stable states. Our methodology is based on the extension of current theoretical frameworks to relax their constraints and use incomplete data. We believe that this provides a rigorous foundation for the design and implementation of safety checking tools.

    View the Full Publication
  • 01/01/2012Detecting the unintended in BGP policiesDebbie Perouli, Timmothy Griffin, Olaf Maennel, Sonia Fahmy, Iain Phillips, Cristel Pelsser

    Internet Service Providers (ISPs) use routing policies to implement the requirements of business contracts, manage traffic, address security concerns and increase scalability of their network. These routing policies are often a high-level expression of strategies or intentions of the ISP. They have meaning when viewed from a network-wide perspective (e.g., mark on ingress, filter on egress). However, configuring these policies for the Border Gateway Protocol (BGP) is undertaken at a low-level, on a per router basis. Unintended routing outcomes have been observed. In this work, we define a language that allows analysis of network-wide configurations at the high-level. This language aims at bridging the gap between router configurations and abstract mathematical models capable of capturing complex policies. The language can be used to verify desired properties of routing protocols and hence detect potential unintended states of BGP. The language is accompanied by a tool suite that parses router configuration languages (which by their nature are vendor-dependent) and translates them into vendor-independent representations of policies.

    View the Full Publication
  • 01/01/2012Content retrieval using cloud-based DNSRavish Khosla, Sonia Fahmy, Y.C. Hu

    Cloud-computing systems are rapidly gaining momentum, providing flexible alternatives to many services. We study the Domain Name System (DNS) service, used to convert host names to IP addresses, which has historically been provided by a client's Internet Service Provider (ISP). With the advent of cloud-based DNS providers such as Google and OpenDNS, clients are increasingly using these DNS systems for URL and other name resolution. Performance degradation with cloud-based DNS has been reported, especially when accessing content hosted on highly distributed CDNs like Akamai. In this work, we investigate this problem in depth using Akamai as the content provider and Google DNS as the cloud-based DNS system. We demonstrate that the problem is rooted in the disparity between the number and location of servers of the two providers, and develop a new technique for geolocating data centers of cloud providers. Additionally, we explore the design space of methods for cloud-based DNS systems to be effective. Client-side, cloud-side, and hybrid approaches are presented and compared, with the goal of achieving the best client-perceived performance. Our work yields valuable insight into Akamai's DNS system, revealing previously unknown features.

    View the Full Publication
  • 01/01/2012Link correlation and network coding in broadcast protocols for wireless sensor networksSalmin Sultana, Y.C. Hu, Sonia Fahmy, S.M.I. Alam

    Correlated packet reception can be advantageous for sensor network broadcast protocols. By exploiting link correlation information, researchers have devised efficient single packet flooding protocols. In this work, we use testbed experiments to gain insight into the behavior of link correlation-aware broadcast protocols. We observe that, in the presence of varying link correlation, traditional link correlation-aware flooding mechanisms do not perform well in disseminating multiple packets due to reliability requirements and redundant transmissions. We conduct simulations to compare existing link correlation-aware flooding protocols with two versions of a multi-packet dissemination protocol, where one uses network coding and the other exploits both link correlation and network coding. Simulation results indicate the potential of the latter approach to be used as a reliable multi-packet dissemination protocol in practical scenarios. We also compare this protocol with existing multi-packet dissemination protocols, and reveal cases when certain protocols perform better than others.

    View the Full Publication
  • 01/01/2012HandsOn DB: Managing Data Dependencies involving Human ActionsMohamed Eltabakh, Walid G. Aref, Ahmed Elmagarmid, Mourad Ouzzani

    IEEE Transactions on Knowledge and Data Engineering, 16 July 2013. IEEE computer Society Digital Library. IEEE Computer Society

    View the Full Publication
  • 01/01/2012Authenticated Top-K Aggregation in Distributed and Authenticated Top-K Aggregation in Distributed andSunoh Choi, Hyo-Sang Lim, Elisa Bertino

    Top-k queries have attracted interest in many different areas like network and system monitoring, information retrieval, sensor networks, and so on. Since today many applications issue top-k queries on distributed and outsourced databases,authentication of top-k query results becomes more important. This paper addresses the problem of authenticated top-k aggregation queries (e.g. “find the k objects with the highest aggregate values”) in a distributed system. We propose a new algorithm, called Authenticated Three Phase Uniform Threshold (A-TPUT), which provides not only efficient top-k aggregation over distributed databases but also authentication on the top-k results. We also introduce several enhancements for A-TPUT to reduce both the computation cost and the communication cost. Finally, we confirm the efficiency of our solutions through an extensive experimental evaluation.

    View the Full Publication
  • 01/01/2012Trusted Identities in CyberspaceElisa Bertino

    Secure and privacy-preserving digital identity management is a key requirement for secure use of the Internet and other online environments. However, the landscape of digital identity management is quite complex, with several different stakeholders. Here, the author discusses critical issues that must be addressed for the large-scale and effective deployment of digital identity solutions.

    View the Full Publication
  • 01/01/2012An Access Control Framework for WS-BPEL ProcessesFederica Paci, Elisa Bertino, Jason Crampton

    Business processes –the next generation workflows- have attracted considerable research interest in the last fifteen years. More recently, several XML-based languages have been proposed for specifying and orchestrating business processes, resulting in the WS-BPEL language. Even if WS-BPEL has been developed to specify automated business processes that orchestrate activities of multiple Web services, there are many applications and situations requiring that people be considered as additional participants that can influence the execution of a process. Significant omissions from WS-BPEL are the specification of activities that require interactions with humans to be completed, called human activities, and the specification of authorization information associating users with human activities in a WS-BPEL business process and authorization constraints, such as separation of duty, on the execution of human activities. In this chapter, we address these deficiencies by introducing a new type of WS-BPEL activity to model human activities and by developing RBAC-WS-BPEL, a role based access control model for WS-BPEL and BPCL, a language to specify authorization constraints.

    View the Full Publication
  • 01/01/2012ARTL@ S and BasArt: A loose coupling strategy for digital humanitiesSorin Matei

    The core ARTL@S digital humanitiesstrategy is that of loosely couplingresources, platforms, and use scenarios.A number of sites will feed fromthe same geodatabase (BasArt),which will be enriched by users withnew content. Inspired by the Web 2.0design principles, ARTL@S relies onthe BasArt API, which will enable anecosystem of sites to use primarydata to generate their own maps,charts, and tables. A variety of economicmodels will also be used tosupport the site, from free to paybased.User-generated content will bemonitored by data management andcuration techniques that will ensurethe rigor of the scientific approach.

    View the Full Publication
  • 11/01/2011Irregularity in high-dimensional space-filling curvesMohamed Mokbel, Walid G. Aref

    A space-filling curve is a way of mapping the discrete multi-dimensional space into the one-dimensional space. It acts like a thread that passes through every cell element (or pixel) in the discrete multi-dimensional space so that every cell is visited exactly once. Thus, a space-filling curve imposes a linear order of the cells in the multi-dimensional space. There are numerous kinds of space-filling curves. The difference between such curves is in their way of mapping to the one-dimensional space. Selecting the appropriate curve for any application requires knowledge of the mapping scheme provided by each space-filling curve. Irregularity is proposed as a quantitative measure for the ordering quality imposed by space-filling curve mapping. The lower the irregularity the better the space-filling curve in preserving the order of the discrete multi-dimensional space. Five space-filling curves (the Sweep, Scan, Peano, Gray, and Hilbert) are analyzed with respect to irregularity. Closed formulas are developed to compute the irregularity in any dimension k for a D-dimensional space-filling curve with grid size N. A comparative study of different space-filling curves with respect to the irregularity is conducted and results are presented and discussed. We find out that for an application that is biased toward one of the dimensions, the Sweep or the Scan space-filling curves are the best choice. For high-dimensional applications, the Peano space-filling curve would be the best choice. For applications that require fairness among various dimensions, the Hilbert and Gray space-filling curves are the best choice.

    View the Full Publication
  • 11/01/2011Mitochondrial Small RNAs that are Up-Regulated in Hippocampus during Olfactory Discrimination Training in MiceNeil R. Smalheiser, Giovanni Lugli, Jyothi Thimmapuram, Edwin H. Cook, John Larson

    Adult mice were trained to execute a nose-poke in a port containing one of two simultaneously present odors in order to obtain a reward. Hippocampus RNA of trained mice vs. controls was subjected to Illumina deep sequencing. Two mitochondrial RNAs (a tRNA and Mt-1) gave rise to 25–30-nt. small RNAs that showed a dramatic and specific increase with training (>50-fold relative to controls). Mt-1 is encoded within the termination association sequence (TAS) of the mitochondrial DNA control region. Small RNAs may link behavioral plasticity to protein synthesis and replication of mitochondria to support dendritic growth, spine stabilization, and synapse formation.

    View the Full Publication
  • 10/01/2011Fine-grained integration of access control policiesPrathima Rao, Dan Lin, Elisa Bertino, Ninghui Li, Jorge Lobo

    Collaborative and distributed applications, such as dynamic coalitions and virtualized grid computing, often require integrating access control policies of collaborating parties. Such an integration must be able to support complex authorization specifications and the fine-grained integration requirements that the various parties may have. In this paper, we introduce an algebra for fine-grained integration of sophisticated policies. The algebra, which consists of three binary and two unary operations, is able to support the specification of a large variety of integration constraints. For ease of use, we also introduce a set of derived operators and provide guidelines for users to edit a policy with desired properties. To assess the expressive power of our algebra, we define notion of completeness and prove that our algebra is complete and minimal with respect to the notion. We then propose a framework that uses the algebra for the fine-grained integration of policies expressed in XACML. We also present a methodology for generating the actual integrated XACML policy, based on the notion of Multi-Terminal Binary Decision Diagrams. Experimental results have demonstrated both effectiveness and efficiency of our approach. In addition, we also discuss issues regarding obligations.

    View the Full Publication
  • 08/12/2011Transcriptome Sequencing of the Blind Subterranean Mole Rat, Spalax galili: Utility and Potential for the Discovery of Novel Evolutionary PatternsAssaf Malik, Abraham Korol, Sariel Hubner, Aaron Avivi, Alvaro Hernandez, Jyothi Thimmapuram, Shahjahan Ali, Mark Bond, Fabian Glaser, Arnon Paz

    The blind subterranean mole rat (Spalax ehrenbergi superspecies) is a model animal for survival under extreme environments due to its ability to live in underground habitats under severe hypoxic stress and darkness. Here we report the transcriptome sequencing of Spalax galili, a chromosomal type of S. ehrenbergi. cDNA pools from muscle and brain tissues isolated from animals exposed to hypoxic and normoxic conditions were sequenced using Sanger, GS FLX, and GS FLX Titanium technologies. Assembly of the sequences yielded over 51,000 isotigs with homology to ~12,000 mouse, rat or human genes. Based on these results, it was possible to detect large numbers of splice variants, SNPs, and novel transcribed regions. In addition, multiple differential expression patterns were detected between tissues and treatments. The results presented here will serve as a valuable resource for future studies aimed at identifying genes and gene regions evolved during the adaptive radiation associated with underground life of the blind mole rat.

    View the Full Publication
  • 08/01/2011Protecting information systems from insider threats - concepts and issuesElisa Bertino

    Summary form only given. Past research on information security has focused on protecting valuable resources from attacks by outsiders. However, statistics show that a large amount of security and privacy breaches are due to insider attacks. Protection from insider threats is challenging because insiders may have access to many sensitive resources and high-privileged system accounts. Suitable approaches need to combine several security techniques, like fine-grained access control, stronger authentication protocols, integrated digital identity management, intrusion detection, with techniques from areas like information integration, machine learning, and risk assessment. In this talk, after an introduction to the problem of insider threats, we will present recent work addressing the problem of anomaly detection and response policies for database management systems and then discuss open research issues, by emphasizing the role of techniques from the area of information integration.

    View the Full Publication
  • 07/01/2011ACConv -- An Access Control Model for Conversational Web ServicesFederica Paci, Massimo Mecella, Mourad Ouzzani, Elisa Bertino

    With organizations increasingly depending on Web services to build complex applications, security and privacy concerns including the protection of access control policies are becoming a serious issue. Ideally, service providers would like to make sure that clients have knowledge of only portions of the access control policy relevant to their interactions to the extent to which they are entrusted by the Web service and without restricting the client’s choices in terms of which operations to execute. We propose ACConv, a novel model for access control in Web services that is suitable when interactions between the client and the Web service are conversational and long-running. The conversation-based access control model proposed in this article allows service providers to limit how much knowledge clients have about the credentials specified in their access policies. This is achieved while reducing the number of times credentials are asked from clients and minimizing the risk that clients drop out of a conversation with the Web service before reaching a final state due to the lack of necessary credentials. Clients are requested to provide credentials, and hence are entrusted with part of the Web service access control policies, only for some specific granted conversations which are decided based on: (1) a level of trust that the Web service provider has vis-à-vis the client, (2) the operation that the client is about to invoke, and (3) meaningful conversations which represent conversations that lead to a final state from the current one. We have implemented the proposed approach in a software prototype and conducted extensive experiments to show its effectiveness.

    View the Full Publication
  • 07/01/2011Aggregated Privacy-Preserving Identity Verification for Composite Web ServicesNan Guo, Tianhan Gao, Ben Zhang, Ruchith Fernando, Elisa Bertino

    An aggregated privacy-preserving identity verification scheme is proposed for composite Web services. It aggregates multiple component providers' interactions of identity verification to a single one involving the user. Besides, it protects users from privacy disclosure through the adoption of zero-knowledge of proof of knowledge. This approach can dramatically reduce the computation time, independently on the number of identity attributes and component providers.

    View the Full Publication
  • 06/01/2011Design and Implementation of an Intrusion Response System for Relational DatabasesAshish Kamra, Elisa Bertino

    The intrusion response component of an overall intrusion detection system is responsible for issuing a suitable response to an anomalous request. We propose the notion of database response policies to support our intrusion response system tailored for a DBMS. Our interactive response policy language makes it very easy for the database administrators to specify appropriate response actions for different circumstances depending upon the nature of the anomalous request. The two main issues that we address in context of such response policies are that of policy matching, and policy administration. For the policy matching problem, we propose two algorithms that efficiently search the policy database for policies that match an anomalous request. We also extend the PostgreSQL DBMS with our policy matching mechanism, and report experimental results. The experimental evaluation shows that our techniques are very efficient. The other issue that we address is that of administration of response policies to prevent malicious modifications to policy objects from legitimate users. We propose a novel Joint Threshold Administration Model (JTAM) that is based on the principle of separation of duty. The key idea in JTAM is that a policy object is jointly administered by at least k database administrator (DBAs), that is, any modification made to a policy object will be invalid unless it has been authorized by at least k DBAs. We present design details of JTAM which is based on a cryptographic threshold signature scheme, and show how JTAM prevents malicious modifications to policy objects from authorized users. We also implement JTAM in the PostgreSQL DBMS, and report experimental results on the efficiency of our techniques.

    View the Full Publication
  • 06/01/2011Profile-Based Selection of Accountability Policies in Grid Computing SystemsWonjun Lee, Anna Squicciarini, Elisa Bertino

    Accountability in grid computing systems is an important requirement, in that it makes it possible to control activities of users and resource providers through the collection and analysis of accountability data. The accountability policies specify what to collect and when, and more importantly how to coordinate the data collection among different administrative domains. If elements of the data to be sent from a node to another are missing or different from the ones required by the policy due to the insufficient capabilities of the node to collect data, conflicts -- inability to comply - may occur. To solve such conflicts and yet obtain flexible accountability processes, we propose a profile-based policy selection mechanism. We show how by this mechanism we can adapt the accountability policies according to the specific features of jobs and nodes, while at the same time achieving a minimum level of accountability.

    View the Full Publication
  • 06/01/2011A Provenance Based Mechanism to Identify Malicious Packet Dropping Adversaries in Sensor NetworksSalmin Sultana, Elisa Bertino, Mohamed Shehab

    Malicious packet dropping attack is a major security threat to the data traffic in the sensor network, since it reduces the legal network throughput and may hinder the propagation of sensitive data. Dealing with this attack is challenging since the unreliable wireless communication feature and resource constraints of the sensor network may cause communication failure and mislead to the incorrect decision about the presence of such attack. In this paper, we propose a data provenance based mechanism to detect the attack and identify the source of attack i.e. the malicious node. For this purpose, we utilize the characteristics of the watermarking based secure provenance transmission mechanism that we proposed earlier and rely on the inter-packet timing characteristics after the provenance embedding. The scheme consists of three phases (i) Packet Loss Detection (ii) Identification of Attack Presence (iii) Localizing the Malicious Node/Link. The packet loss is detected based on the distribution of the inter-packet delays. The presence of the attack is determined by comparing the empricial average packet loss rate with the natural packet loss rate of the data flow path. To isolate the malicious link, we transmit more provenance information along with the sensor data. We present the experimental results to show the high detection accuracy and energy efficiency of the proposed scheme.

    View the Full Publication
  • 06/01/2011A Provenance Based Mechanism to Identify Malicious Packet Dropping Adversaries in Sensor NetworksSalmin Sultana, Elisa Bertino, Mohamed Shehab

    Malicious packet dropping attack is a major security threat to the data traffic in the sensor network, since it reduces the legal network throughput and may hinder the propagation of sensitive data. Dealing with this attack is challenging since the unreliable wireless communication feature and resource constraints of the sensor network may cause communication failure and mislead to the incorrect decision about the presence of such attack. In this paper, we propose a data provenance based mechanism to detect the attack and identify the source of attack i.e. the malicious node. For this purpose, we utilize the characteristics of the watermarking based secure provenance transmission mechanism that we proposed earlier and rely on the inter-packet timing characteristics after the provenance embedding. The scheme consists of three phases (i) Packet Loss Detection (ii) Identification of Attack Presence (iii) Localizing the Malicious Node/Link. The packet loss is detected based on the distribution of the inter-packet delays. The presence of the attack is determined by comparing the empricial average packet loss rate with the natural packet loss rate of the data flow path. To isolate the malicious link, we transmit more provenance information along with the sensor data. We present the experimental results to show the high detection accuracy and energy efficiency of the proposed scheme.

    View the Full Publication
  • 06/01/2011A New Class of Buffer Overflow AttacksAshish Kundu, Elisa Bertino

    In this paper, we focus on a class of buffer overflow vulnerabilities that occur due to the "placement new'' expression in C++. "Placement new'' facilitates placement of an object/array at a specific memory location. When appropriate bounds checking is not in place, object overflows may occur. Such overflows can lead to stack as well as heap/data/bss overflows, which can be exploited by attackers in order to carry out the entire range of attacks associated with buffer overflow. Unfortunately, buffer overflows due to "placement new'' have neither been studied in the literature nor been incorporated in any tool designed to detect and/or address buffer overflows. In this paper, we show how the "placement new'' expression in C++ can be used to carry out buffer overflow attacks - on the stack as well as heap/data/bss. We show that overflowing objects and arrays can also be used to carry out virtual table pointer subterfuge, as well as function and variable pointer subterfuge. Moreover, we show how "placement new" can be used to leak sensitive information, and how denial of service attacks can be carried out via memory leakage.

    View the Full Publication
  • 06/01/2011Partitioning Network Testbed ExperimentsWei-Min Yao, Sonia Fahmy

    Understanding the behavior of large-scale systems is challenging, but essential when designing new Internet protocols and applications. It is often infeasible or undesirable to conduct experiments directly on the Internet. Thus, simulation, emulation, and testbed experiments are important techniques for researchers to investigate large-scale systems. In this paper, we propose a platform-independent mechanism to partition a large network experiment into a set of small experiments that are sequentially executed. Each of the small experiments can be conducted on a given number of experimental nodes, e.g., the available machines on a testbed. Results from the small experiments approximate the results that would have been obtained from the original large experiment. We model the original experiment using a flow dependency graph. We partition this graph, after pruning uncongested links, to obtain a set of small experiments. We execute the small experiments in two iterations. In the second iteration, we model dependent partitions using information gathered about both the traffic and the network conditions during the first iteration. Experimental results from several simulation and testbed experiments demonstrate that our techniques approximate performance characteristics, even with closed-loop traffic and congested links. We expose the fundamental trade off between the simplicity of the partitioning and experimentation process, and the loss of experimental fidelity.

    View the Full Publication
  • 06/01/2011Mitigating interference in a network measurement serviceSriharsha Gangam, Sonia Fahmy

    Shared measurement services offer key advantages over conventional ad-hoc techniques for network monitoring. A measurement service may receive measurement requests concurrently from different applications and network administrators. These measurement requests are often served by injecting active network measurement traffic between two hosts. Two active measurements are said to interfere when the probe packets of one measurement tool are viewed as network traffic by the other. This may lead to faulty measurement readings. In this paper, we model the measurement interference problem, and show how to schedule measurement tasks to reduce interference and hence increase measurement accuracy. We propose twelve computationally tractable algorithms that decrease the total completion time (makespan) of measurement tasks, while avoiding interference. Our evaluation shows that the algorithm we refer to as Largest Area First, Busiest Node First - Earliest Interval Schedule (LAFBNF-EIS) has a mean makespan of about 5% more than the theoretical lower bound over our set of measurement workloads.

    View the Full Publication
  • 06/01/2011Energy-efficient provenance transmission in large-scale wireless sensor networksS.M.I. Alam, Sonia Fahmy

    Large-scale sensor-based decision support systems are being widely deployed. Assessing the trustworthiness of sensor data and the owners of this data is critical for quality assurance of decision making in these systems. Trust evaluation frameworks use data provenance along with the sensed data values to compute the trustworthiness of each data item. However, in a sizeable multi-hop sensor network, provenance information requires a large and variable number of bits in each packet, which, in turn, results in high energy dissipation with extended period of radio communication, making trust systems unusable. We propose an energy-efficient provenance transmission and construction scheme, which we refer to as Probabilistic Provenance Flow (PPF). To the best of our knowledge, ours is the first approach to make the Probabilistic Packet Marking (PPM) approach of IP traceback feasible for sensor networks. We propose two bit-efficient complementary provenance encoding and construction methods, and combine them to handle topological changes in the network. Our TOSSIM simulations demonstrate that PPF requires at least 33% fewer packets and consumes 30% less energy than PPM-based approaches to construct provenance, yet still provides high accuracy in trust score calculation.

    View the Full Publication
  • 05/01/2011Jitter-minimized reliability-maximized management of networksWaseem Sheikh, Arif Ghafoor

    We propose a joint optimization network management framework for quality-of-service (QoS) routing with resource allocation. Our joint optimization framework provides a convenient way of maximizing the reliability or minimizing the jitter delay of paths. Data traffic is sensitive to droppage at buffers, while it can tolerate jitter delay. On the other hand, multimedia traffic can tolerate loss but it is very sensitive to jitter delay. Depending on the type of data, our scheme provides a convenient way of selecting the parameters which result in either reliability maximization or jitter minimization. We solve the optimization problem for a GPS network and provide the optimal solutions. We find the values of control parameters which control the type of optimization performed. We use our analytical results in a multi-objective QoS routing algorithm. Finally, we provide insights into our optimization framework using simulations.

    View the Full Publication
  • 05/01/2011Maximum Margin Multiple Instance Clustering with its Applications to Image and Text ClusteringDan Zhang, Fei Wang, Luo Si, Tao Li

    In multiple instance learning problems, patterns are often given as bags and each bag consists of some instances. Most of existing research in the area focuses on multiple instance classification and multiple instance regression, while very limited work has been conducted for multiple instance clustering (MIC). This paper formulates a novel framework, maximum margin multiple instance clustering (M(3)IC), for MIC. However, it is impractical to directly solve the optimization problem of M(3)IC. Therefore, M(3)IC is relaxed in this paper to enable an efficient optimization solution with a combination of the constrained concave-convex procedure and the cutting plane method. Furthermore, this paper presents some important properties of the proposed method and discusses the relationship between the proposed method and some other related ones. An extensive set of empirical results are shown to demonstrate the advantages of the proposed method against existing research for both effectiveness and efficiency.

    View the Full Publication
  • 05/01/2011Privacy-Preserving Updates to Anonymous and Confidential DatabasesAlberto Trombetta, Wei Jiang, Elisa Bertino, Lorenzo Bossi

    Suppose Alice owns a k-anonymous database and needs to determine whether her database, when inserted with a tuple owned by Bob, is still k-anonymous. Also, suppose that access to the database is strictly controlled, because for example data are used for certain experiments that need to be maintained confidential. Clearly, allowing Alice to directly read the contents of the tuple breaks the privacy of Bob (e.g., a patient's medical record); on the other hand, the confidentiality of the database managed by Alice is violated once Bob has access to the contents of the database. Thus, the problem is to check whether the database inserted with the tuple is still k-anonymous, without letting Alice and Bob know the contents of the tuple and the database, respectively. In this paper, we propose two protocols solving this problem on suppression-based and generalization-based k-anonymous and confidential databases. The protocols rely on well-known cryptographic assumptions, and we provide theoretical analyses to proof their soundness and experimental results to illustrate their efficiency.

    View the Full Publication
  • 05/01/2011Detection and Protection against Distributed Denial of Service Attacks in Accountable Grid Computing SystemsWonjun Lee, Anna Squicciarini, Elisa Bertino

    By exploiting existing vulnerabilities, malicious parties can take advantage of resources made available by grid systems to attack mission-critical websites or the grid itself. In this paper, we present two approaches for protecting against attacks targeting sites outside or inside the grid. Our approach is based on special-purpose software agents that collect provenance and resource usage data in order to perform detection and protection. We show the effectiveness and the efficiency of our approach by conducting various experiments on an emulated grid test-bed.

    View the Full Publication
  • 05/01/2011Detection and Protection against Distributed Denial of Service Attacks in Accountable Grid Computing SystemsWonjun Lee, Anna Squicciarini, Elisa Bertino

    By exploiting existing vulnerabilities, malicious parties can take advantage of resources made available by grid systems to attack mission-critical websites or the grid itself. In this paper, we present two approaches for protecting against attacks targeting sites outside or inside the grid. Our approach is based on special-purpose software agents that collect provenance and resource usage data in order to perform detection and protection. We show the effectiveness and the efficiency of our approach by conducting various experiments on an emulated grid test-bed.

    View the Full Publication
  • 04/05/2011Orchardgrass (Dactylis glomerata L.) EST and SSR marker development, annotation, and transferability.B.S. Bushman, S.R. Larson, M Tuna, M.S. West, A.G. Hernandez, D Vullaganti, G Gong, J.G. Robbins, K. B. Jensen, Jyothi Thimmapuram

    Orchardgrass, or cocksfoot [Dactylis glomerata (L.)], has been naturalized on nearly every continent and is a commonly used species for forage and hay production. All major cultivated varieties of orchardgrass are autotetraploid, and few tools or information are available for functional and comparative genetic analyses and improvement of the species. To improve the genetic resources for orchardgrass, we have developed an EST library and SSR markers from salt, drought, and cold stressed tissues. The ESTs were bi-directionally sequenced from clones and combined into 17,373 unigenes. Unigenes were annotated based on putative orthology to genes from rice, Triticeae grasses, other Poaceae, Arabidopsis, and the non-redundant database of the NCBI. Of 1,162 SSR markers developed, approximately 80% showed amplification products across a set of orchardgrass germplasm, and 40% across related Festuca and Lolium species. When orchardgrass subspecies were genotyped using 33 SSR markers their within-accession similarity values ranged from 0.44 to 0.71, with Mediterranean accessions having a higher similarity. The total number of genotyped bands was greater for tetraploid accessions compared to diploid accessions. Clustering analysis indicated grouping of Mediterranean subspecies and central Asian subspecies, while the D. glomerata ssp. aschersoniana was closest related to three cultivated varieties.

    View the Full Publication
  • 04/01/2011On the Cost of Network Inference MechanismsEthan Blanton, Sonia Fahmy, Greg Frederickson, Sriharsha Gangam

    A number of network path delay, loss, or bandwidth inference mechanisms have been proposed over the past decade. Concurrently, several network measurement services have been deployed over the Internet and intranets. We consider inference mechanisms that use O(n) end-to-end measurements to predict the O(n²) end-to-end pairwise measurements among n nodes, and investigate when it is beneficial to use them in measurement services. In particular, we address the following questions : 1) For which measurement request patterns would using an inference mechanism be advantageous? 2) How does a measurement service determine the set of hosts that should utilize inference mechanisms, as opposed to those that are better served using direct end-to-end measurements? We explore three solutions that identify groups of hosts which are likely to benefit from inference. We compare these solutions in terms of effectiveness and algorithmic complexity. Results with synthetic data sets and data sets from a popular peer-to-peer system demonstrate that our techniques accurately identify host subsets that benefit from inference, in significantly less time than an algorithm that identifies optimal subsets. The measurement savings are large when measurement request patterns exhibit small-world characteristics, which is often the case. (Part of this work (focusing on one of three solutions presented in this paper) appeared in [1]).

    View the Full Publication
  • 04/01/2011BGP molecules: Understanding and predicting prefix failuresRavish Khosla, Sonia Fahmy, Y.C. Hu

    The Border Gateway Protocol (BGP), the de-facto Internet interdomain routing protocol, disseminates information about Internet prefixes to Autonomous Systems (ASes). Prefixes are announced and withdrawn as routes and policies change, making them unreachable from portions of the Internet for certain time periods. This paper aims to predict routing failures of prefixes in the Internet.We investigate the similarity of prefixes in the Internet with respect to their propensity to fail, i.e., become unreachable. Given a prefix of interest, we define a “BGP molecule” - the prefixes in the Internet that are likely to fail together with this prefix. We show that the AS paths to prefixes, coupled with knowledge of the prefix geographical location, contribute to its failure tendency. The BGP molecules constructed are used in four failure prediction schemes among which a hybrid scheme achieves 91% predictability of failures with 99.3% coverage of prefixes in the Internet

    View the Full Publication
  • 03/01/2011DECHO—a framework for the digital exploration of cultural heritage objectsDaniel G. Aliaga, Elisa Bertino, Stefano Valtolina

    We present a framework for the digital exploration of cultural heritage objects. Today computing and information technology is pervasive and ubiquitous and has boosted at unprecedented levels, information diffusion and productivity. Such technology is today a ripe context for succinctly gathering knowledge by combining in innovative ways powerful visualization tactics, rapid access to a significant amount of relevant information, domain-specific knowledge, and rich and pervasive tools to sort, group, and slice the information and knowledge in different ways. To this end, we present a complete framework that is easy to use, does not require expensive custom equipment, and has been designed for helping archaeology researchers and educators reconstruct and analyze the historical context of cultural heritage objects. Our main inspiration is that archaeology today would benefit significantly from having spur-of-the-moment access to information from a variety of heterogeneous data sources and being able to have multiple participants visually observe factual and visual data in an intuitive and natural setting. While we present a framework geared towards archaeology, in the long term we envision reusing it in a variety of fields.

    Our framework includes data acquisition, data management, and data visualization components. The data acquisition component enables the fast, easy, and accurate addition of 3D object models and factual data, including narrations. The data management component includes a novel semantic database system that provides an intuitive view of the available contents in terms of an ontology, supports the addition of narrations, integrates data stored by other databases, and supports object retrieval, browsing, and knowledge navigation. The data visualization component provides visual feedback, which is a crucial part of an exploratory endeavor. It provides the ability to alter the appearance of archaeological objects, complete fragments of 3D object models, and several compelling forms of digital inspection and information visualization. All algorithms exploit knowledge from the database and from the obtained 3D models. Visuals can be applied on top of the physical object or on a 3D model shown in a traditional display, controllable via a Web page interface.

    View the Full Publication
  • 02/01/2011Privacy Preserving OLAP over Distributed XML Data: A Theoretically-Sound Secure-Multiparty-Computation ApproachAlfredo Cuzzocrea, Elisa Bertino

    Privacy Preserving Distributed OLAP is becoming a critical challenge for next-generation Business Intelligence (BI) scenarios, due to the “natural suitability” of OLAP in analyzing distributed massive BI repositories in a multidimensional and multi-granularity manner. In particular, in these scenarios XML-formatted BI repositories play a dominant role, due to the well-know amenities of XML in modeling and representing distributed business data. However, while Privacy Preserving Distributed Data Mining has been widely investigated, the problem of effectively and efficiently supporting privacy preserving OLAP over distributed collections of XML documents, which is relevant in practice, has been neglected so far. In order to fulfill this gap, we propose a novel Secure Multiparty Computation (SMC)-based privacy preserving OLAP framework for distributed collections of XML documents. The framework has many novel features ranging from nice theoretical properties to an effective and efficient protocol, called Secure Distributed OLAP aggregation protocol (SDO). The efficiency of our approach has been validated by an experimental evaluation over distributed collections of synthetic, benchmark and real-life XML documents.

    View the Full Publication
  • 02/01/2011Guided data repairAhmed Elmagarmid, Jennifer Neville, Mourad Ouzzani, Ihab Ilyas, Mohamed Yakout

    In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specific updates. To rank potential updates for consultation by the user, we first group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively refine the training set for the model. We empirically evaluate GDR on a real-world dataset and show significant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.

    View the Full Publication
  • 02/01/2011Prediction models for long-term Internet prefix availabilityRavish Khosla, Sonia Fahmy, Y.C. Hu, Jennifer Neville

    The Border Gateway Protocol (BGP) maintains inter-domain routing information by announcing and withdrawing IP prefixes. These routing updates can cause prefixes to be unreachable for periods of time, reducing prefix availability observed from different vantage points on the Internet. The observed prefix availability values may not meet the standards promised by Service Level Agreements (SLAs).

    In this paper, we develop a framework for predicting long-term availability of prefixes, given short-duration prefix information from publicly available BGP routing databases like RouteViews, and prediction models constructed from information about other Internet prefixes. We compare three prediction models and find that machine learning-based prediction methods outperform a baseline model that predicts the future availability of a prefix to be the same as its past availability. Our results show that mean time to failure is the most important attribute for predicting availability. We also quantify how prefix availability is related to prefix length and update frequency. Our prediction models achieve 82% accuracy and 0.7 ranking quality when predicting for a future duration equal to the learning duration. We can also predict for a longer future duration, with graceful performance reduction. Our models allow ISPs to adjust BGP routing policies if predicted availability is low, and are generally useful for cloud computing systems, content distribution networks, P2P, and VoIP applications.

    View the Full Publication
  • 02/01/2011Forwarding devices: From measurements to simulationsRoman Chertov, Sonia Fahmy

    Most popular simulation and emulation tools use high-level models of forwarding behavior in switches and routers, and give little guidance on setting model parameters such as buffer sizes. Thus, a myriad of papers report results that are highly sensitive to the forwarding model or buffer size used. Incorrect conclusions are often drawn from these results about transport or application protocol performance, service provisioning, or vulnerability to attacks. In this article, we argue that measurement-based models for routers and other forwarding devices are necessary. We devise such a model and validate it with measurements from three types of Cisco routers and one Juniper router, under varying traffic conditions. The structure of our model is device-independent, but the model uses device specific parameters. The compactness of the parameters and simplicity of the model make it versatile for high-fidelity simulations that preserve simulation scalability. We construct a profiler to infer the parameters within a few hours. Our results indicate that our model approximates different types of routers significantly better than the default ns-2 simulator models. The results also indicate that queue characteristics vary dramatically among the devices we measure, and that backplane contention can be a factor.

    View the Full Publication
  • 01/01/2011A conditional purpose-based access control model with dynamic rolesEnamul Kabir, Hua Wang, Elisa Bertino

    This paper presents a model for privacy preserving access control which is based on variety of purposes. Conditional purpose is applied along with allowed purpose and prohibited purpose in the model. It allows users using some data for certain purpose with conditions. The structure of conditional purpose-based access control model is defined and investigated through dynamic roles. Access purpose is verified in a dynamic behavior, based on subject attributes, context attributes and authorization policies. Intended purposes are dynamically associated with the requested data object during the access decision. An algorithm is developed to achieve the compliance computation between access purposes and intended purposes and is illustrated with Role-based access control (RBAC) in a dynamic manner to support conditional purpose-based access control. According to this model more information from data providers can be extracted while at the same time assuring privacy that maximizes the usability of consumers’ data. It extends traditional access control models to a further coverage of privacy preserving in data mining atmosphere. The structure helps enterprises to circulate clear privacy promise, to collect and manage user preferences and consent.

    View the Full Publication
  • 01/01/2011Trust establishment in the formation of Virtual OrganizationsAnna Squicciarini, Federica Paci, Elisa Bertino

    Virtual Organizations (VOs) represent a new collaboration paradigm in which the participating entities pool resources, services, and information to achieve a common goal. VOs represent an interesting approach for companies to achieve new and profitable business opportunities by being able to dynamically partner with others. Thus, choosing the appropriate VO partners is a crucial aspect. Ensuring trustworthiness of the members is also fundamental for making the best decisions. In this paper, we show how trust negotiation represents an effective means to select the best possible members during different stages in the VO lifecycle. We base our discussion on concrete application scenarios and illustrate the tools created by us that integrate trust negotiation with a VO Management toolkit.

    View the Full Publication
  • 01/01/2011Efficient systematic clustering method for k-anonymizationEnamul Kabir, Hua Wang, Elisa Bertino

    This paper presents a clustering (Clustering partitions record into clusters such that records within a cluster are similar to each other, while records in different clusters are most distinct from one another.) based k-anonymization technique to minimize the information

    loss while at the same time assuring data quality. Privacy preservation of individuals has drawn considerable interests in data mining research. The k-anonymity model proposed by Samarati and Sweeney is a practical approach for data privacy preservation and has been studied extensively for the last few years. Anonymization methods via generalization or suppression are able to protect private information, but lose valued information. The challenge is how to minimize the information loss during the anonymization process. We refer to the challenge as a systematic clustering problem for k-anonymization which is analysed in this paper. The proposed technique adopts group-similar data together and then anonymizes each group individually. The structure of systematic clustering problem is defined and investigated through paradigm and properties. An algorithm of the proposed problem is developed and shown that the time complexity is in O( n2 k ), where n

    is the total number of records containing individuals concerning their privacy. Experimental results show that our method attains a reasonable dominance with respect to both information loss and execution time. Finally the algorithm illustrates the usability for incremental datasets.

    View the Full Publication
  • 01/01/2011Prox-RBAC: A Proximity-based Spatially Aware RBACMichael Kirkpatrick, Maria Luisa Damiani, Elisa Bertino

    As mobile computing devices are becoming increasingly dominant in enterprise and government organizations, the need for fine-grained access control in these environments continues to grow. Specifically, advanced forms of access control can be deployed to ensure authorized users can access sensitive resources only when in trusted locations. One technique that has been proposed is to augment role-based access control (RBAC) with spatial constraints. In such a system, an authorized user must be in a designated location in order to exercise the privileges associated with a role. In this work, we extend spatially aware RBAC systems by defining the notion of proximity-based r approach, access control

    decisions are not based solely on the requesting user’s location. Instead, we also consider the location of other users in the system. For instance, a policy in a government application could prevent access to a sensitive document if any civilians are present. We introduce our spatial model and the notion of proximity constraints. We define the syntax and semantics for the Prox-RBAC language, which can be used to specify these policy constraints. We introduce our enforcement architecture, including the protocols and algorithms for enforcing Prox-RBAC policies, and give a proof of functional correctness. Finally, we describe our work toward a Prox-RBAC prototype and present an informal security analysis.

    View the Full Publication
  • 01/01/2011Privacy-Preserving Assessment of Location Data TrustworthinessChenyun Dai, Fang-Yu Rao, Gabriel Ghinita, Elisa Bertino

    Assessing the trustworthiness of location data corresponding to individuals is essential in several applications, such as forensic science and epidemic control. To obtain accurate and trustworthy location data, analysts must often gather and correlate information from several independent sources, e.g., physical observation, witness testimony, surveillance footage, etc. However, such information may be fraudulent, its accuracy may be low, and its vol-ume may be insufficient to ensure highly trustworthy data. On the other hand, recent advancements in mobile computing and positioning systems, e.g., GPS-enabled cell phones, highway sensors, etc., bring new and effective technological means to track the location of an individual. Nevertheless, collection and sharing of such data must be done in ways that do not violate an individual’s right to personal privacy. Previous research efforts acknowledged the importance of assessing location data trustworthiness, but they assume that data is available to the analyst in direct, unperturbed form. However, such an assumption is not realistic, due to the fact that repositories of personal location data must conform to privacy regulations.In this paper, we study the challenging problem of refining trustworthiness of location data with the help of large repositories of anonymized information. We show how two important trustworthiness evaluation techniques, namely common pattern analysis and conflict/support analysis, can benefit from the use of anonymized location data. We have implemented a prototype of the proposed privacy-preserving trustworthiness evaluation techniques, and theexperimental results demonstrate that using anonymized data can significantly help in improving the accuracy of location trustworthiness assessment.

    View the Full Publication
  • 01/01/2011Federated Search”. Foundations and Trends in Information Retrieval (FTIR)Milad Shokouhi, Luo Si

    Federated search (federated information retrieval or distributed information retrieval) is a technique for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant answers. The results returned by selected collections are integrated and merged into a single list. Federated search is preferred over centralized search alternatives in many environments. For example, commercial search engines such as Google cannot easily index uncrawlable hidden web collections while federated search systems can search the contents of hidden web collections without crawling. In enterprise environments, where each organization maintains an independent search engine, federated search techniques can provide parallel search over multiple collections. There are three major challenges in federated search. For each query, a subset of collections that are most likely to return relevant documents are selected. This creates the collection selection problem. To be able to select suitable collections, federated search systems need to acquire some knowledge about the contents of each collection, creating the collection representation problem. The results returned from the selected collections are merged before the final presentation to the user. This final step is the result merging problem. The goal of this work is to provide a comprehensive summary of the previous research on the federated search challenges described above.

    View the Full Publication
  • 01/01/2011Document clustering with universumDan Zhang, Jingdong Wang, Luo Si

    Document clustering is a popular research topic, which aims to partition documents into groups of similar objects (i.e., clusters), and has been widely used in many applications such as automatic topic extraction, document organization and filtering. As a recently proposed concept, Universum is a collection of "non-examples" that do not belong to any concept/cluster of interest. This paper proposes a novel document clustering technique -- Document Clustering with Universum, which utilizes the Universum examples to improve the clustering performance. The intuition is that the Universum examples can serve as supervised information and help improve the performance of clustering, since they are known not belonging to any meaningful concepts/clusters in the target domain. In particular, a maximum margin clustering method is proposed to model both target examples and Universum examples for clustering. An extensive set of experiments is conducted to demonstrate the effectiveness and efficiency of the proposed algorithm.

    View the Full Publication
  • 01/01/2011Composite Hashing with Multiple Information SourcesDan Zhang, Fei Wang, Luo Si

    Similarity search applications with a large amount of text and image data demands an e±cient and e®ective solution. One useful strategy is to represent the examples in databases as compact binary codes through semantic hashing, which has attracted much attention due to its fast query/search speed and drastically reduced storage requirement. All of the current semantic hashing methods only deal with the case when each example is represented by one type of features. However, examples are often described from several di®erent information sources in many real world applications. For example, the characteristics of a webpage can be derived from both its content part and its associated links.

    To address the problem of learning good hashing codes in this scenario, we propose a novel research problem { Composite Hashing with Multiple Information Sources (CHMIS). The focus of the new research problem is to design an algorithm for incorporating the features from di®erent information sources into the binary hashing codes e±ciently and e®ectively. In particular, we propose an algorithm CHMISAW (CHMIS with Adjusted Weights) for learning the codes. The proposed algorithm integrates information from several di®erent sources into the binary hashing codes by adjusting the weights on each individual source for maximizing the coding performance, and enables fast conversion from query examples to their binary hashing codes. Experimental results on ¯ve di®erent datasets demonstrate the superior performance of the proposed method against several other state-of-the-art semantic hashing techniques.

    View the Full Publication
  • 01/01/2011Learning Beyond the Predefined Label SpaceDan Zhang, Yan Liu, Luo Si

    Most traditional supervised learning methods are developed to learn a model from labeled examples and use this model to classify the unlabeled ones into the same label space predefined by the models. However, in many real world applications, the label spaces for both the labeled/training and unlabeled/testing examples can be different. To solve this problem, this paper proposes a novel notion of Serendipitous Learning (SL), which is defined to address the learning scenarios in which the label space can be enlarged during the testing phase. In particular, a large margin approach is proposed to solve SL. The basic idea is to leverage the knowledge in the labeled examples to help identify novel/unknown classes, and the large margin formulation is proposed to incorporate both the classification loss on the examples within the known categories, as well as the clustering loss on the examples in unknown categories. An efficient optimization algorithm based on CCCP and the bundle method is proposed to solve the optimization problem of the large margin formulation of SL. Moreover, an efficient online learning method is proposed to address the issue of large scale data in online learning scenario, which has been shown to have a guaranteed learning regret. An extensive set of experimental results on two synthetic datasets and two datasets from real world applications demonstrate the advantages of the proposed method over several other baseline algorithms. One limitation of the proposed method is that the number of unknown classes is given in advance. It may be possible to remove this constraint if we model it by using a non-parametric way. We also plan to do experiments on more real world applications in the future.

    View the Full Publication
  • 01/01/2011Multi-View Transfer Learning with a Large Margin ApproachDan Zhang, Jingrui He, Yan Liu, Luo Si, Richard Lawrence

    Transfer learning has been proposed to address the problem of scarcity of labeled data in the target domain by leveraging the data from the source domain. In many real world applications, data is often represented from different perspectives, which correspond to multiple views. For example, a web page can be described by its contents and its associated links. However, most existing transfer learning methods fail to capture the multi-view {nature}, and might not be best suited for such applications.

    To better leverage both the labeled data from the source domain and the features from different views, {this paper proposes} a general framework: Multi-View Transfer Learning with a Large Margin Approach (MVTL-LM). On one hand, labeled data from the source domain is effectively utilized to construct a large margin classifier; on the other hand, the data from both domains is employed to impose consistencies among multiple views. As an instantiation of this framework, we propose an efficient optimization method, which is guaranteed to converge to ε precision in O(1/ε) steps. Furthermore, we analyze its error bound, which improves over existing results of related methods. An extensive set of experiments are conducted to demonstrate the advantages of our proposed method over state-of-the-art techniques.

    View the Full Publication
  • 01/01/2011Identifying Similar People in Professional Social Networks with Discriminative Probabilistic ModelsSuleyman Cetintas, Monica Rogati, Luo Si, Yi Fang

    dentifying similar professionals is an important task for many core services in professional social networks. Information about users can be obtained from heterogeneous information sources, and different sources provide different insights on user similarity.

    This paper proposes a discriminative probabilistic model that identifies latent content and graph classes for people with similar profile content and social graph similarity patterns, and learns a specialized similarity model for each latent class. To the best of our knowledge, this is the first work on identifying similar professionals in professional social networks, and the first work that identifies latent classes to learn a separate similarity model for each latent class. Experiments on a real-world dataset demonstrate the effectiveness of the proposed discriminative learning model.

    View the Full Publication
  • 01/01/2011Forecasting Counts of User Visits for Online Display Advertising with Prob. Latent Class ModelsSuleyman Cetintas, Datong Chen, Luo Si, Ben Chen, Zhanibek Datbayev

    Display advertising is a multi-billion dollar industry where advertisers promote their products to users by having publishers display their advertisements on popular Web pages. An important problem in online advertising is how to forecast the number of user visits for a Web page during a particular period of time. Prior research addressed the problem by using traditional time-series forecasting techniques on historical data of user visits; (e.g., via a single regression model built for forecasting based on historical data for all Web pages) and did not fully explore the fact that different types of Web pages have different patterns of user visits.

    In this paper we propose a probabilistic latent class model to automatically learn the underlying user visit patterns among multiple Web pages. Experiments carried out on real-world data demonstrate the advantage of using latent classes in forecasting online user visits.

    View the Full Publication
  • 01/01/2011Forecasting Counts of User Visits for Online Display Advertising with Prob. Latent Class ModelsSuleyman Cetintas, Datong Chen, Luo Si, Ben Chen, Zhanibek Datbayev

    Display advertising is a multi-billion dollar industry where advertisers promote their products to users by having publishers display their advertisements on popular Web pages. An important problem in online advertising is how to forecast the number of user visits for a Web page during a particular period of time. Prior research addressed the problem by using traditional time-series forecasting techniques on historical data of user visits; (e.g., via a single regression model built for forecasting based on historical data for all Web pages) and did not fully explore the fact that different types of Web pages have different patterns of user visits.

    In this paper we propose a probabilistic latent class model to automatically learn the underlying user visit patterns among multiple Web pages. Experiments carried out on real-world data demonstrate the advantage of using latent classes in forecasting online user visits

    View the Full Publication
  • 01/01/2011A Weighted Curve Fitting Method for Result Merging in Federated SearchChuan He, Dzung Hong, Luo Si

    Result merging is an important step in federated search to merge the documents returned from multiple source-specific ranked lists for a user query. Previous result merging methods such as Semi-Supervised Learning (SSL) and Sample- Agglomerate Fitting Estimate (SAFE) use regression methods to estimate global document scores from document ranks in individual ranked lists. SSL relies on overlapping documents that exist in both individual ranked lists and a centralized sample database. SAFE goes a step further by using both overlapping documents with accurate rank information and documents with estimated rank information for regression. However, existing methods do not distinguish the accurate rank information from the estimated information. Furthermore, all documents are assigned equal weights in regression while intuitively, documents in the top should carry higher weights. This paper proposes a weighted curve fitting method for result merging in federated search. The new method explicitly models the importance of information from overlapping documents over non-overlapping ones. It also weights documents at different positions differently. Empirically results on two datasets clearly demonstrate the advantage of the proposed algorithm.

    View the Full Publication
  • 01/01/2011Analysis of an Expert Search Query LogYi Fang, Naveen Somasundaram, Luo Si, Jeongwoo Ko, Aditya P. Mathur

    Expert search has made rapid progress in modeling, algorithms and evaluations in the recent years. However, there is very few work on analyzing how users interact with expert search systems. In this paper, we conduct analysis of an expert search query log. The aim is to understand the special characteristics of expert search usage. To the best of our knowledge, this is one of the earliest work on expert search query log analysis. We find that expert search users generally issue shorter queries, more common queries, and use more advanced search features, with fewer queries in a session, than general Web search users do. This study explores a new research direction in expert search by analyzing and exploiting query logs.

    View the Full Publication
  • 01/01/2011On the Complexity of Authorization in RBAC under Qualification and Security ConstraintsYuqing Sun, Qihua Wang, Ninghui Li, Elisa Bertino, Mikhail J. Atallah

    In practice, assigning access permissions to users must satisfy a variety of constraints motivated by business and security requirements. Here, we focus on Role-Based Access Control (RBAC) systems, in which access permissions are assigned to roles and roles are then assigned to users. User-role assignment is subject to role-based constraints, such as mutual exclusion constraints, prerequisite constraints, and role-cardinality constraints. Also, whether a user is qualified for a role depends on whether his/her qualification satisfies the role's requirements. In other words, a role can only be assigned to a certain set of qualified users. In this paper, we study fundamental problems related to access control constraints and user-role assignment, such as determining whether there are conflicts in a set of constraints, verifying whether a user-role assignment satisfies all constraints, and how to generate a valid user-role assignment for a system configuration. Computational complexity results and/or algorithms are given for the problems we consider.

    View the Full Publication
  • 01/01/2011Fine-Grained Cloaking of Sensitive Positions in Location-Sharing ApplicationsMaria Luisa Damiani, Claudio Silvestri, Elisa Bertino

    Geosocial networking applications magnify the concern for location privacy because a user's position can be disclosed to diverse untrusted parties. The Privacy Preserving Obfuscation Environment (Probe) framework supports semantic-location cloaking to protect this information.

    View the Full Publication
  • 01/01/2011Approximate and exact hybrid algorithms for private nearest-neighbor queries with database protectionGabriel Ghinita, Panos Kalnis, Murat Kantarcioglu, Elisa Bertino

    Mobile devices with global positioning capabilities allow users to retrieve points of interest (POI) in their proximity. To protect user privacy, it is important not to disclose exact user coordinates to un-trusted entities that provide location-based services. Currently, there are two main approaches to protect the location privacy of users: (i) hiding locations inside cloaking regions (CRs) and (ii) encrypting location data using private information retrieval (PIR) protocols. Previous work focused on finding good trade-offs between privacy and performance of user protection techniques, but disregarded the important issue of protecting the POI dataset D. For instance, location cloaking requires large-sized CRs, leading to excessive disclosure of POIs (O(|D|) in the worst case). PIR, on the other hand, reduces this bound to O(Ö{|D|})O(D) , but at the expense of high processing and communication overhead. We propose hybrid, two-step approaches for private location-based queries which provide protection for both the users and the database. In the first step, user locations are generalized to coarse-grained CRs which provide strong privacy. Next, a PIR protocol is applied with respect to the obtained query CR. To protect against excessive disclosure of POI locations, we devise two cryptographic protocols that privately evaluate whether a point is enclosed inside a rectangular region or a convex polygon. We also introduce algorithms to efficiently support PIR on dynamic POI sub-sets. We provide solutions for both approximate and exact NN queries. In the approximate case, our method discloses O(1) POI, orders of magnitude fewer than CR- or PIR-based techniques. For the exact case, we obtain optimal disclosure of a single POI, although with slightly higher computational overhead. Experimental results show that the hybrid approaches are scalable in practice, and outperform the pure-PIR approach in terms of computational and communication overhead.

    View the Full Publication
  • 01/01/2011Access Control for Databases: Concepts and SystemsElisa Bertino, Gabriel Ghinita, Ashish Kamra

    As organizations depend on, possibly distributed, information systems for operational, decisional and strategic activities, they are vulnerable to security breaches leading to data theft and unauthorized disclosures even as they gain productivity and efficiency advantages. Though several techniques, such as encryption and digital signatures, are available to protect data when transmitted across sites, a truly comprehensive approach for data protection must include mechanisms for enforcing access control policies based on data contents, subject qualifications and characteristics, and other relevant contextual information, such as time. It is well understood today that the semantics of data must be taken into account in order to specify effective access control policies. To address such requirements, over the years the database security research community has developed a number of access control techniques and mechanisms that are specific to database systems. In this article we present a comprehensive state of the art about models, systems and approaches proposed for specifying and enforcing access control policies in database management systems. In addition to surveying the foundational work in the area of access control for database systems, we present extensive case studies covering advanced features of current database management systems, such as the support for fine-grained and context-based access control, the support for mandatory access control, and approaches for protecting the data from insider threats. The article also covers novel approaches, based on cryptographic techniques, to enforce access control and surveys access control models for object-databases and XML data. For the reader not familiar with basic notions concerning access control and cryptography, we include a tutorial presentation on these notions. Finally, the article concludes with a discussion on current challenges for database access control and security, and preliminary approaches addressing some of these challenges.

    View the Full Publication
  • 01/01/2011A conditional purpose-based access control model with dynamic rolesMd. Enamul Kabir, Hua Wang, Elisa Bertino

    This paper presents a model for privacy preserving access control which is based on variety of purposes. Conditional purpose is applied along with allowed purpose and prohibited purpose in the model. It allows users using some data for certain purpose with conditions. The structure of conditional purpose-based access control model is defined and investigated through dynamic roles. Access purpose is verified in a dynamic behavior, based on subject attributes, context attributes and authorization policies. Intended purposes are dynamically associated with the requested data object during the access decision. An algorithm is developed to achieve the compliance computation between access purposes and intended purposes and is illustrated with Role-based access control (RBAC) in a dynamic manner to support conditional purpose-based access control. According to this model, more information from data providers can be extracted while at the same time assuring privacy that maximizes the usability of consumers’ data. It extends traditional access control models to a further coverage of privacy preserving in data mining atmosphere. The structure helps enterprises to circulate clear privacy promise, to collect and manage user preferences and consent.

    View the Full Publication
  • 01/01/2011xfACL: an extensible functional language for access controlQun Ni, Elisa Bertino

    The main goal of modern access control policy languages is to offer high-level languages, by using which security officers and application developers can express a large variety of access restrictions and isolate the security logic from the application logic. However, the current state-of-the-art language, XACML, suffers from some design flaws and lacks important features, such as those that characterize the RBAC model. Therefore, we propose an access control language that combines the benefits of both XACML and RBAC while avoiding their drawbacks.

    View the Full Publication
  • 01/01/2011PUF ROKs: a hardware approach to read-once keysMichael Kirkpatrick, Sam Kerr, Elisa Bertino

    Cryptographers have proposed the notion of read-once keys (ROKs) as a beneficial tool for a number of applications, such as delegation of authority. The premise of ROKs is that the key is destroyed by the process of reading it, thus preventing subsequent accesses. While the idea and the applications are well-understood, the consensus among cryptographers is that ROKs cannot be produced by algorithmic processes alone. Rather, a trusted hardware mechanism is needed to support the destruction of the key. In this work, we propose one such approach for using a hardware design to generate ROKs. Our approach is an application of physically unclonable functions (PUFs). PUFs use the intrinsic differences in hardware behavior to produce a random function that is unique to that hardware instance. Our design consists of incorporating the PUF in a feedback loop to make reading the key multiple times physically impossible.

    View the Full Publication
  • 01/01/2011Prox-RBAC: a proximity-based spatially aware RBACMichael Kirkpatrick, Maria Luisa Damiani, Elisa Bertino

    As mobile computing devices are becoming increasingly dominant in enterprise and government organizations, the need for fine-grained access control in these environments continues to grow. Specifically, advanced forms of access control can be deployed to ensure authorized users can access sensitive resources only when in trusted locations. One technique that has been proposed is to augment role-based access control (RBAC) with spatial constraints. In such a system, an authorized user must be in a designated location in order to exercise the privileges associated with a role. In this work, we extend spatially aware RBAC systems by defining the notion of proximity-based RBAC. In our approach, access control decisions are not based solely on the requesting user's location. Instead, we also consider the location of other users in the system. For instance, a policy in a government application could prevent access to a sensitive document if any civilians are present. We introduce our spatial model and the notion of proximity constraints. We define the syntax and semantics for the Prox-RBAC language, which can be used to specify these policy constraints. We introduce our enforcement architecture, including the protocols and algorithms for enforcing Prox-RBAC policies, and give a proof of functional correctness. Finally, we describe our work toward a Prox-RBAC prototype and present an informal security analysis.

    View the Full Publication
  • 01/01/2011Privacy-preserving assessment of location data trustworthinessChenyun Dai, Fang-Yu Rao, Gabriel Ghinita, Elisa Bertino

    Assessing the trustworthiness of location data corresponding to individuals is essential in several applications, such as forensic science and epidemic control. To obtain accurate and trustworthy location data, analysts must often gather and correlate information from several independent sources, e.g., physical observation, witness testimony, surveillance footage, etc. However, such information may be fraudulent, its accuracy may be low, and its volume may be insufficient to ensure highly trustworthy data. On the other hand, recent advancements in mobile computing and positioning systems, e.g., GPS-enabled cell phones, highway sensors, etc., bring new and effective technological means to track the location of an individual. Nevertheless, collection and sharing of such data must be done in ways that do not violate an individual's right to personal privacy.

    Previous research efforts acknowledged the importance of assessing location data trustworthiness, but they assume that data is available to the analyst in direct, unperturbed form. However, such an assumption is not realistic, due to the fact that repositories of personal location data must conform to privacy regulations. In this paper, we study the challenging problem of refining trustworthiness of location data with the help of large repositories of anonymized information. We show how two important trustworthiness evaluation techniques, namely common pattern analysis and conflict/support analysis, can benefit from the use of anonymized location data. We have implemented a prototype of the proposed privacy-preserving trustworthiness evaluation techniques, and the experimental results demonstrate that using anonymized data can significantly help in improving the accuracy of location trustworthiness assessment.

    View the Full Publication
  • 01/01/2011PUF ROKs: a hardware approach to read-once keysMichael Kirkpatrick, Sam Kerr, Elisa Bertino

    Cryptographers have proposed the notion of read-once keys (ROKs) as a beneficial tool for a number of applications, such as delegation of authority. The premise of ROKs is that the key is destroyed by the process of reading it, thus preventing subsequent accesses. While the idea and the applications are well-understood, the consensus among cryptographers is that ROKs cannot be produced by algorithmic processes alone. Rather, a trusted hardware mechanism is needed to support the destruction of the key. In this work, we propose one such approach for using a hardware design to generate ROKs. Our approach is an application of physically unclonable functions (PUFs). PUFs use the intrinsic differences in hardware behavior to produce a random function that is unique to that hardware instance. Our design consists of incorporating the PUF in a feedback loop to make reading the key multiple times physically impossible.

    View the Full Publication
  • 01/01/2011Performance Analysis and Tuning of Automatically Parallelized OpenMP ApplicationsDheya Mustafa, Aurangzeb, Rudolf Eigenmann

    Automatic parallelization combined with tuning techniques is an alternative to manual parallelization of sequential programs to exploit the increased computational power that current multi-core systems offer. Automatic parallelization concentrates on finding any possible parallelism in the program, whereas tuning systems help identifying efficient parallel code segments and serializing inefficient ones using runtime performance metrics. In this work we study the performance gap between automatic and hand parallel OpenMP applications and try to find whether this gap can be filled by compile-time techniques or it needs dynamic or user-interactive solutions. We implement an empirical tuning framework and propose an algorithm that partitions programs into sections and tunes each code section individually. Experiments show that tuned applications perform better than original serial programs in the worst case and sometimes outperform hand-parallel applications. Our work is one of the first approaches delivering an auto-parallelization system that guarantees performance improvements for nearly all programs; hence it eliminates the need for users to “experiment” with such tools in order to obtain the shortest runtime of their applications.

    View the Full Publication
  • 01/01/2011Serendipitous learning: learning beyond the predefined label spaceDan Zhang, Yan Liu, Luo Si

    Most traditional supervised learning methods are developed to learn a model from labeled examples and use this model to classify the unlabeled ones into the same label space predefined by the models. However, in many real world applications, the label spaces for both the labeled/training and unlabeled/testing examples can be different. To solve this problem, this paper proposes a novel notion of Serendipitous Learning (SL), which is defined to address the learning scenarios in which the label space can be enlarged during the testing phase. In particular, a large margin approach is proposed to solve SL. The basic idea is to leverage the knowledge in the labeled examples to help identify novel/unknown classes, and the large margin formulation is proposed to incorporate both the classification loss on the examples within the known categories, as well as the clustering loss on the examples in unknown categories. An efficient optimization algorithm based on CCCP and the bundle method is proposed to solve the optimization problem of the large margin formulation of SL. Moreover, an efficient online learning method is proposed to address the issue of large scale data in online learning scenario, which has been shown to have a guaranteed learning regret. An extensive set of experimental results on two synthetic datasets and two datasets from real world applications demonstrate the advantages of the proposed method over several other baseline algorithms. One limitation of the proposed method is that the number of unknown classes is given in advance. It may be possible to remove this constraint if we model it by using a non-parametric way. We also plan to do experiments on more real world applications in the future.

    View the Full Publication
  • 01/01/2011Multi-view transfer learning with a large margin approachDan Zhang, Jingrui He, Yan Liu, Luo Si, Richard Lawrence

    Transfer learning has been proposed to address the problem of scarcity of labeled data in the target domain by leveraging the data from the source domain. In many real world applications, data is often represented from different perspectives, which correspond to multiple views. For example, a web page can be described by its contents and its associated links. However, most existing transfer learning methods fail to capture the multi-view {nature}, and might not be best suited for such applications.

    To better leverage both the labeled data from the source domain and the features from different views, {this paper proposes} a general framework: Multi-View Transfer Learning with a Large Margin Approach (MVTL-LM). On one hand, labeled data from the source domain is effectively utilized to construct a large margin classifier; on the other hand, the data from both domains is employed to impose consistencies among multiple views. As an instantiation of this framework, we propose an efficient optimization method, which is guaranteed to converge to ε precision in O(1/ε) steps. Furthermore, we analyze its error bound, which improves over existing results of related methods. An extensive set of experiments are conducted to demonstrate the advantages of our proposed method over state-of-the-art techniques.

    View the Full Publication
  • 01/01/2011U-MAP: a system for usage-based schema matching and mappingHazem Elmeleegy, Jaewoo Lee, El Kindi Rezig, Mourad Ouzzani, Ahmed Elmagarmid

    This demo shows how usage information buried in query logs can play a central role in data integration and data exchange. More specifically, our system U-Map uses query logs to generate correspondences between the attributes of two different schemas and the complex mapping rules to transform and restructure data records from one of these schemas to another. We introduce several novel features showing the benefit of incorporating query log analysis into these key components of data integration and data exchange systems.

    View the Full Publication
  • 01/01/2011Leveraging query logs for schema mapping generation in U-MAPHazem Elmeleegy, Ahmed Elmagarmid, Jaewoo Lee

    In this paper, we introduce U-MAP, a new system for schema mapping generation. U-MAP builds upon and extends existing schema mapping techniques. However, it mitigates some key problems in this area, which have not been previously addressed. The key tenet of U-MAP is to exploit the usage information extracted from the query logs associated with the schemas being mapped. We describe our experience in applying our proposed system to realistic datasets from the retail and life sciences domains. Our results demonstrate the effectiveness and efficiency of U-MAP compared to traditional approaches.

    View the Full Publication
  • 01/01/2011Omnify: Investigating the Visibility and Effectiveness of Copyright MonitorsRahul Potharaju, Jeff Seibert, Sonia Fahmy, Cristina Nita-Rotaru

    The arms race between copyright agencies and P2P users is an ongoing and evolving struggle. On the one hand, content providers are using several techniques to stealthily find unauthorized distribution of copyrighted work in order to deal with the problem of Internet piracy. On the other hand, P2P users are relying increasingly on blacklists and anonymization methods in order to avoid detection. In this work, we propose a number of techniques to reveal copyright monitors’ current approaches and evaluate their effectiveness. We apply these techniques on data we collected from more than 2.75 million BitTorrent swarms containing 71 million IP addresses. We provide strong evidence that certain nodes are indeed copyright monitors, show that monitoring is a world-wide phenomenon, and devise a methodology for generating blacklists for paranoid and conservative P2P users.

    View the Full Publication
  • 01/01/2011Identification and analysis of common bean (Phaseolus vulgaris L.) transcriptomes by massively parallel pyrosequencingVenu Kalavacharla, Zhanji Liu, Blake C. Meyers, Jyothi Thimmapuram, Kalpalatha Melmaiee

    Common bean (Phaseolus vulgaris) is the most important food legume in the world. Although this crop is very important to both the developed and developing world as a means of dietary protein supply, resources available in common bean are limited. Global transcriptome analysis is important to better understand gene expression, genetic variation, and gene structure annotation in addition to other important features. However, the number and description of common bean sequences are very limited, which greatly inhibits genome and transcriptome research. Here we used 454 pyrosequencing to obtain a substantial transcriptome dataset for common bean.

    View the Full Publication
  • 12/01/2010Discriminative Graphical Models for Joint Faculty Homepage DiscoveryYi Fang, Luo Si, Aditya P. Mathur

    Faculty homepage discovery is an important step toward building an academic portal. Although the general homepage finding tasks have been well studied (e.g., TREC-2001 Web Track), faculty homepage discovery has its own special characteristics and not much focused research has been conducted for this task. In this paper, we view faculty homepage discovery as text categorization problems by utilizing Yahoo BOSS API to generate a small list of high-quality candidate homepages. Because the labels of these pages are not independent, standard text categorization methods such as logistic regression, which classify each page separately, are not well suited for this task. By defining homepage dependence graph, we propose a conditional undirected graphical model to make joint predictions by capturing the dependence of the decisions on all the candidate pages. Three cases of dependencies among faculty candidate homepages are considered for constructing the graphical model. Our model utilizes a discriminative approach so that any informative features can be used conveniently. Learning and inference can be done relatively efficiently for the joint prediction model because the homepage dependence graphs resulting from the three cases of dependencies are not densely connected. An extensive set of experiments have been conducted on two testbeds to show the effectiveness of the proposed discriminative graphical model.

    View the Full Publication
  • 11/02/2010Endogenous siRNAs and noncoding RNA-derived small RNAs are expressed in adult mouse hippocampus and are up-regulated in olfactory discrimination training.Neil Smalheiser, G Lugli, Jyothi Thimmapuram, E.H. Cook, J Larson

    We previously proposed that endogenous siRNAs may regulate synaptic plasticity and long-term gene expression in the mammalian brain. Here, a hippocampal-dependent task was employed in which adult mice were trained to execute a nose-poke in a port containing one of two simultaneously present odors in order to obtain a reward. Mice demonstrating olfactory discrimination training were compared to pseudo-training and nose-poke control groups; size-selected hippocampal RNA was subjected to Illumina deep sequencing. Sequences that aligned uniquely and exactly to the genome without uncertain nucleotide assignments, within exons or introns of MGI annotated genes, were examined further. The data confirm that small RNAs having features of endogenous siRNAs are expressed in brain; that many of them derive from genes that regulate synaptic plasticity (and have been implicated in neuropsychiatric diseases); and that hairpin-derived endo-siRNAs and the 20- to 23-nt size class of small RNAs show a significant increase during an early stage of training. The most abundant putative siRNAs arose from an intronic inverted repeat within the SynGAP1 locus; this inverted repeat was a substrate for dicer in vitro, and SynGAP1 siRNA was specifically associated with Argonaute proteins in vivo. Unexpectedly, a dramatic increase with training (more than 100-fold) was observed for a class of 25- to 30-nt small RNAs derived from specific sites within snoRNAs and abundant noncoding RNAs (Y1 RNA, RNA component of mitochondrial RNAse P, 28S rRNA, and 18S rRNA). Further studies are warranted to characterize the role(s) played by endogenous siRNAs and noncoding RNA-derived small RNAs in learning and memory.

    View the Full Publication
  • 11/01/2010OpenMPC: Extended OpenMP Programming and Tuning for GPUsSeyong Lee, Rudolf Eigenmann

    General-Purpose Graphics Processing Units (GPGPUs) are promising parallel platforms for high performance computing. The CUDA (Compute Unified Device Architecture) programming model provides improved programmability for general computing on GPGPUs. However, its unique execution model and memory model still pose significant challenges for developers of efficient GPGPU code. This paper proposes a new programming interface, called OpenMPC, which builds on OpenMP to provide an abstraction of the complex CUDA programming model and offers high-level controls of the involved parameters and optimizations. We have developed a fully automatic compilation and user-assisted tuning system supporting OpenMPC. In addition to a range of compiler transformations and optimizations, the system includes tuning capabilities for generating, pruning, and navigating the search space of compilation variants. Our results demonstrate that OpenMPC offers both programmability and tunability. Our system achieves 88% of the performance of the hand-coded CUDA programs.

    View the Full Publication
  • 09/01/2010A General Framework for Web Content FilteringElisa Bertino, Elena Ferrari, Andrea Perego

    Web content filtering is a means to make end-users aware of the `quality' of Web resources by evaluating their contents and/or characteristics against users' preferences. Although they can be used for a variety of purposes, Web content filtering tools are mainly deployed as a service for parental control purposes, and for regulating the access to Web content by users connected to the networks of enterprises, libraries, schools, etc. Current Web filtering tools are based on well established techniques, such as data mining and firewall blocking, and they typically cater to the filtering requirements of very specific end-user categories. Therefore, what is lacking is a unified filtering framework able to support all the possible application domains, and making it possible to enforce interoperability among the different filtering approaches and the systems based on them. In this paper, a multi-strategy approach is described, which integrates the available techniques and focuses on the use of metadata for rating and filtering Web information. Such an approach consists of a filtering meta-model, referred to as MFM (Multi-strategy Filtering Model), which provides a general representation of the Web content filtering domain, independently from its possible applications, and of two prototype implementations, partially carried out in the framework of the EU projects EUFORBIA and QUATRO, and designed for different application domains: user protection and Web quality assurance, respectively.

    View the Full Publication
  • 08/01/2010Efficient Privacy-Preserving Similar Document DetectionMummoorthy Murugesan, Wei Jiang, Chris W. Clifton, Luo Si, Jiadeep Vaidya

    Similar document detection plays important roles in many applications, such as file management, copyright protection, plagiarism prevention, and duplicate submission detection. The state of the art protocols assume that the contents of files stored on a server (or multiple servers) are directly accessible. However, this makes such protocols unsuitable for any environment where the documents themselves are sensitive and cannot be openly read. Essentially, this assumption limits more practical applications, e.g., detecting plagiarized documents between two conferences, where submissions are confidential. We propose novel protocols to detect similar documents between two entities where documents cannot be openly shared with each other. The similarity measure used can be a simple cosine similarity on entire documents or on document fragments, enabling detection of partial copying. We conduct extensive experiments to show the practical value of the proposed protocols. While the proposed base protocols are much more efficient than the general secure multiparty computation based solutions, they are still slow for large document sets. We then investigate a clustering based approach that significantly reduces the running time and achieves over 90% of accuracy in our experiments. This makes secure similar document detection both practical and feasible.

    View the Full Publication
  • 07/01/2010Privacy-aware role-based access controlQun Ni, Elisa Bertino, Jorge Lobo, Carolyn Brodie

    In this article, we introduce a comprehensive framework supporting a privacy-aware access control mechanism, that is, a mechanism tailored to enforce access control to data containing personally identifiable information and, as such, privacy sensitive. The key component of the framework is a family of models (P-RBAC) that extend the well-known RBAC model in order to provide full support for expressing highly complex privacy-related policies, taking into account features like purposes and obligations. We formally define the notion of privacy-aware permissions and the notion of conflicting permission assignments in P-RBAC, together with efficient conflict-checking algorithms. The framework also includes a flexible authoring tool, based on the use of the SPARCLE system, supporting the high-level specification of P-RBAC permissions. SPARCLE supports the use of natural language for authoring policies and is able to automatically generate P-RBAC permissions from these natural language specifications. In the article, we also report performance evaluation results and contrast our approach with other relevant access control and privacy policy frameworks such as P3P, EPAL, and XACML.

    View the Full Publication
  • 06/01/2010SimDB: a similarity-aware database systemYasin Silva, Ahmed Aly, Walid G. Aref, Per-Ake Larson

    The identification and processing of similarities in the data play a key role in multiple application scenarios. Several types of similarity-aware operations have been studied in the literature. However, in most of the previous work, similarity-aware operations are studied in isolation from other regular or similarity-aware operations. Furthermore, most of the previous research in the area considers a standalone implementation, i.e., without any integration with a database system. In this demonstration we present SimDB, a similarity-aware database management system. SimDB supports multiple similarity-aware operations as first-class database operators. We describe the architectural changes to implement the similarity-aware operators. In particular, we present the way conventional operators' implementation machinery is extended to support similarity-aware operators. We also show how these operators interact with other similarity-aware and regular operators. In particular, we show the effectiveness of multiple equivalence rules that can be used to extend cost-based query optimization to the case of similarity-ware operations.

    View the Full Publication
  • 06/01/2010An optimal bandwidth allocation and data droppage scheme for differentiated services in a wireless networkWaseem Sheikh, Arif Ghafoor

    This paper presents an optimal proportional bandwidth allocation and data droppage scheme to provide differentiated services (DiffServ) for downlink pre-orchestrated multimedia data in a single-hop wireless network. The proposed resource allocation scheme finds the optimal bandwidth allocation and data drop rates under minimum quality-of-service (QoS) constraints. It combines the desirable attributes of relative DiffServ and absolute DiffServ approaches. In contrast to relative DiffServ approach, the proposed scheme guarantees the minimum amount of bandwidth provided to each user without dropping any data at the base-station, when the network has sufficient resources. If the network does not have sufficient resources to provide minimum bandwidth guarantees to all users without dropping data, the proportional data dropper finds the optimal data drop rates within acceptable levels of QoS and thus avoids the inflexibility of absolute DiffServ approach. The optimal bandwidth allocation and data droppage problems are formulated as constrained nonlinear optimization problems and solved using efficient techniques. Simulations are performed to show that the proposed scheme exhibits the desirable features of absolute and relative DiffServ.

    View the Full Publication
  • 06/01/2010Probabilistic Answer Ranking Models for Multilingual QA: Independent Prediction Model and Joint Prediction ModelJeongwoo Ko, Luo Si, Eric Nyberg, Teruko Mitamura

    This article presents two probabilistic models for answering ranking in the multilingual question-answering (QA) task, which finds exact answers to a natural language question written in different languages. Although some probabilistic methods have been utilized in traditional monolingual answer-ranking, limited prior research has been conducted for answer-ranking in multilingual question-answering with formal methods. This article first describes a probabilistic model that predicts the probabilities of correctness for individual answers in an independent way. It then proposes a novel probabilistic method to jointly predict the correctness of answers by considering both the correctness of individual answers as well as their correlations. As far as we know, this is the first probabilistic framework that proposes to model the correctness and correlation of answer candidates in multilingual question-answering and provide a novel approach to design a flexible and extensible system architecture for answer selection in multilingual QA. An extensive set of experiments were conducted to show the effectiveness of the proposed probabilistic methods in English-to-Chinese and English-to-Japanese cross-lingual QA, as well as English, Chinese, and Japanese monolingual QA using TREC and NTCIR questions.

    View the Full Publication
  • 04/01/2010Micro-Blogging in Classroom: Classifying Students’ Relevant and Irrelevant Questions in a Micro-Blogging Supported ClassroomSuleyman Cetintas, Luo Si, Hans Aagard, Kyle Bowen, M Cordova-Sanchez

    Micro-blogging is a popular technology in social networking applications that lets users publish online short text messages (e.g., less than 200 characters) in real time via the web, SMS, instant messaging clients, etc. Micro-blogging can be an effective tool in the classroom and has lately gained notable interest from the education community. This paper proposes a novel application of text categorization for two types of micro-blogging questions asked in a classroom, namely relevant (i.e., questions that the teacher wants to address in the class) and irrelevant questions. Empirical results and analysis show that using personalization together with question text leads to better categorization accuracy than using question text alone. It is also beneficial to utilize the correlation between questions and available lecture materials as well as the correlation between questions asked in a lecture. Furthermore, empirical results also show that the elimination of stopwords leads to better correlation estimation between questions and leads to better categorization accuracy. On the other hand, incorporating students' votes on the questions does not improve categorization accuracy, although a similar feature has been shown to be effective in community question answering environments for assessing question quality.

    View the Full Publication
  • 03/01/2010The similarity join database operatorYasin Silva, Walid G. Aref, Mohamed Ali

    Similarity joins have been studied as key operations in multiple application domains, e.g., record linkage, data cleaning, multimedia and video applications, and phenomena detection on sensor networks. Multiple similarity join algorithms and implementation techniques have been proposed. They range from out-of-database approaches for only in-memory and external memory data to techniques that make use of standard database operators to answer similarity joins. Unfortunately, there has not been much study on the role and implementation of similarity joins as database physical operators. In this paper, we focus on the study of similarity joins as first-class database operators. We present the definition of several similarity join operators and study the way they interact among themselves, with other standard database operators, and with other previously proposed similarity-aware operators. In particular, we present multiple transformation rules that enable similarity query optimization through the generation of equivalent similarity query execution plans. We then describe an efficient implementation of two similarity join operators, Ɛ-Join and Join-Around, as core DBMS operators. The performance evaluation of the implemented operators in PostgreSQL shows that they have good execution time and scalability properties. The execution time of Join-Around is less than 5% of the one of the equivalent query that uses only regular operators while Ɛ-Join's execution time is 20 to 90% of the one of its equivalent regular operators based query for the useful case of small Ɛ (0.01% to 10% of the domain range). We also show experimentally that the proposed transformation rules can generate plans with execution times that are only 10% to 70% of the ones of the initial query plans.

    View the Full Publication
  • 03/01/2010Supporting real-world activities in database management systemsMohamed Eltabakh, Walid G. Aref, Ahmed Elmagarmid, Yasin Silva, Mourad Ouzzani

    The cycle of processing the data in many application domains is complex and may involve real-world activities that are external to the database, e.g., wet-lab experiments, instrument readings, and manual measurements. These real-world activities may take long time to prepare for and to perform, and hence introduce inherently long time delays between the updates in the database. The presence of these long delays between the updates, along with the need for the intermediate results to be instantly available, makes supporting real-world activities in the database engine a challenging task. In this paper, we address these challenges through a system that enables users to reflect their updates immediately into the database while keeping track of the dependent and potentially invalid data items until they are re-validated. The proposed system includes: (1) semantics and syntax for interfaces through which users can express the dependencies among data items, (2) new operators to alert users when the returned query results contain potentially invalid or out-of-date data, and to enable evaluating queries on either valid data only, or both valid and potentially invalid data, and (3) mechanisms for data invalidation and revalidation. The proposed system is being realized via extensions to PostgreSQL.

    View the Full Publication
  • 02/01/2010Supporting views in data stream management systemsThanaa Ghanem, Ahmed Elmagarmid, Per-Ake Larson, Walid G. Aref

    In relational database management systems, views supplement basic query constructs to cope with the demand for “higher-level” views of data. Moreover, in traditional query optimization, answering a query using a set of existing materialized views can yield a more efficient query execution plan. Due to their effectiveness, views are attractive to data stream management systems. In order to support views over streams, a data stream management system should employ a closed (or composable) continuous query language. A closed query language is a language in which query inputs and outputs are interpreted in the same way, hence allowing query composition. This article introduces the Synchronized SQL (or SyncSQL) query language that defines a data stream as a sequence of modify operations against a relation. SyncSQL enables query composition through the unified interpretation of query inputs and outputs. An important issue in continuous queries over data streams is the frequency by which the answer gets refreshed and the conditions that trigger the refresh. Coarser periodic refresh requirements are typically expressed as sliding windows. In this article, the sliding window approach is generalized by introducing the synchronization principle that empowers SyncSQL with a formal mechanism to express queries with arbitrary refresh conditions. After introducing the semantics and syntax, we lay the algebraic foundation for SyncSQL and propose a query-matching algorithm for deciding containment of SyncSQL expressions. Then, the article introduces the Nile-SyncSQL prototype to support SyncSQL queries. Nile-SyncSQL employs a pipelined incremental evaluation paradigm in which the query pipeline consists of a set of differential operators. A cost model is developed to estimate the cost of SyncSQL query execution pipelines and to choose the best execution plan from a set of different plans for the same query. An experimental study is conducted to evaluate the performance of Nile-SyncSQL. The experimental results illustrate the effectiveness of Nile-SyncSQL and the significant performance gains when views are enabled in data stream management systems.

    View the Full Publication
  • 01/01/2010A Visual Analytics Approach to Understanding Spatiotemporal HotspotsRoss Maciejewski, Stephen Rudolph, Ryan Hafen, Ahmad Abusalah, Mohamed Yakout, Mourad Ouzzani, William S. Cleveland, Shaun Grannis, David S. Ebert

    As data sources become larger and more complex, the ability to effectively explore and analyze patterns amongst varying sources becomes a critical bottleneck in analytic reasoning. Incoming data contains multiple variables, high signal to noise ratio, and a degree of uncertainty, all of which hinder exploration, hypothesis eneration/exploration, and decision making. To facilitate the exploration of such data, advanced tool sets are needed that allow the user to interact with their data in a visual environment that provides direct analytic capability for finding data aberrations or hotspots. In this paper, we present a suite of tools designed to facilitate the exploration of spatiotemporal datasets. Our system allows users to search for hotspots in both space and time, combining linked views and interactive filtering to provide users with contextual information about their data and allow the user to develop and explore their hypotheses. Statistical data models and alert detection algorithms are provided to help draw user attention to critical areas. Demographic filtering can then be further applied as hypotheses generated become fine tuned. This paper demonstrates the use of such tools on multiple geo-spatiotemporal datasets.

    View the Full Publication
  • 01/01/2010Spatio-Temporal Access Methods: Part 2 (2003 - 2010)Long-Van Nguyen-Dinh, Walid G. Aref, Mohamed Mokbel

    In spatio-temporal applications, moving objects detect their locations via location-aware devices and update their locations continuously to the server. With the ubiquity and massive numbers of moving objects, many spatio-temporal access methods are developed to process user queries efficiently. Spatio- temporal access methods are classified into four categories: (1) Indexing the past data, (2) Indexing the current data, (3) Indexing the future data, and (4) Indexing data at all points of time. This short survey is Part 2 of our previous work [28]. In Part 2, we give an overview and classification of spatio-temporal access methods that are published between the years 2003 and 2010.

    View the Full Publication
  • 01/01/2010A database server for next-generation scientific data managementMohamed Eltabakh, Walid G. Aref, Ahmed Elmagarmid

    The growth of scientific information and the increasing automation of data collection have made databases integral to many scientific disciplines including life sciences, physics, meteorology, earth and atmospheric sciences, and chemistry. These sciences pose new data management challenges to current database system technologies. This dissertation addresses the following three challenges: (1) Annotation Management: Annotations and provenance information are important metadata that go hand-in-hand with scientific data. Annotating scientific data represents a vital mechanism for scientists to share knowledge and build an interactive and collaborative environment. A major challenge is: How to manage large volumes of annotations, especially at various granularities, e.g., cell, column, and row level annotations, along with their corresponding data items. (2) Complex Dependencies Involving Real-world Activities: The processing of scientific data is a complex cycle that may involve sequences of activities external to the database system, e.g., wet-lab experiments, instrument readings, and manual measurements. These external activities may incur inherently long delays to prepare for and to conduct. Updating a database value may render parts of the database inconsistent until some external activity is executed and its output is reflected back and updated into the database. The challenge is: How to integrate these external activities within the database engine and accommodate the long delays between the updates while making the intermediate results instantly available for querying. (3) Fast Access to Scientific Data with Complex Data Types: Scientific experiments produce large volumes of data of complex types, e.g., arrays, images, long sequences, and multi-dimensional data. A major challenge is: How to provide fast access to these large pools of scientific data with non-traditional data types. In this dissertation, I present extensions to current database engines to address the above challenges. The proposed extensions enable scientific data to be stored and processed within their natural habitat: the database system. Experimental studies and performance analysis for all the proposed algorithms are carried out using both real-world and synthetic datasets. Our results show the applicability of the proposed extensions and their performance gains over other existing techniques and algorithms.

    View the Full Publication
  • 01/01/2010A Privacy-Enhancing Content-Based Publish/Subscribe System Using Scalar Product Preserving TransformationsSunoh Choi, Gabriel Ghinita, Elisa Bertino

    Users of content-based publish/subscribe systems (CBPS) are interested in receiving data items with values that satisfy certain conditions. Each user submits a list of subscription specifications to a broker, which routes data items from publishers to users. When a broker receives a notification that contains a value from a publisher, it forwards it only to the subscribers whose requests match the value. However, in many applications, the data published are confidential, and their contents must not be revealed to brokers. Furthermore, a user’s subscription may contain sensitive information that must be protected from brokers. Therefore, a difficult challenge arises: how to route publisher data to the appropriate subscribers without the intermediate brokers learning the plain text values of the notifications and subscriptions. To that extent, brokers must be able to perform operations on top of the encrypted contents of subscriptions and notifications. Such operations may be as simple as equality match, but often require more complex operations such as determining inclusion of data in a value interval. Previous work attempted to solve this problem by using one-way data mappings or specialized encryption functions that allow evaluation of conditions on ciphertexts. However, such operations are computationally expensive, and the resulting CBPS lack scalability. As fast dissemination is an important requirement in many applications, we focus on a new data transformation method called Asymmetric Scalar-product Preserving Encryption (ASPE) [1]. We devise methods that build upon ASPE to support private evaluation of several types of conditions. We also suggest techniques for secure aggregation of notifications, supporting functions such as sum, minimum, maximum and count. Our experimental evaluation shows that ASPE-based CBPS incurs 65% less overhead for exact-match filtering and 50% less overhead for range filtering compared to the state-of-the-art.

    View the Full Publication
  • 01/01/2010A privacy-preserving approach to policy-based content disseminationNing Shang, Mohamed Nabeel, Federica Paci, Elisa Bertino

    We propose a novel scheme for selective distribution of content, encoded as documents, that preserves the privacy of the users to whom the documents are delivered and is based on an efficient and novel group key management scheme. Our document broadcasting approach is based on access control policies specifying which users can access which documents, or subdocuments. Based on such policies, a broadcast document is segmented into multiple subdocuments, each encrypted with a different key. In line with modern attribute-based access control, policies are specified against identity attributes of users. However our broadcasting approach is privacy-preserving in that users are granted access to a specific document, or subdocument, according to the policies without the need of providing in clear information about their identity attributes to the document publisher. Under our approach, not only does the document publisher not learn the values of the identity attributes of users, but it also does not learn which policy conditions are verified by which users, thus inferences about the values of identity attributes are prevented. Moreover, our key management scheme on which the proposed broadcasting approach is based is efficient in that it does not require to send the decryption keys to the users along with the encrypted document. Users are able to reconstruct the keys to decrypt the authorized portions of a document based on subscription information they have received from the document publisher. The scheme also efficiently handles new subscription of users and revocation of subscriptions.

    View the Full Publication
  • 01/01/2010A reciprocal framework for spatial K-anonymityGabriel Ghinita, Keliang Zhao, Dimitris Papadias, Panos Kalnis

    Spatial K-anonymity (SKA) exploits the concept of K-anonymity in order to protect the identity of users from location-based attacks. The main idea of SKA is to replace the exact location of a user U with an anonymizing spatial region (ASR) that contains at least K-1 other users, so that an attacker can pinpoint U with probability at most 1/K. Simply generating an ASR that includes K users does not guarantee SKA. Previous work defined the reciprocity property as a sufficient condition for SKA. However, the only existing reciprocal method, Hilbert Cloak, relies on a specialized data structure. In contrast, we propose a general framework for implementing reciprocal algorithms using any existing spatial index on the user locations. We discuss ASR construction methods with different tradeoffs on effectiveness (i.e., ASR size) and efficiency (i.e., construction cost). Then, we present case studies of applying our framework on top of two popular spatial indices (namely, R*-trees and Quad-trees). Finally, we consider the case where the attacker knows the query patterns of each user. The experimental results verify that our methods outperform Hilbert Cloak. Moreover, since we employ general-purpose spatial indices, the proposed system is not limited to anonymization, but supports conventional spatial queries as well.

    View the Full Publication
  • 01/01/2010A secure multiparty computation privacy preserving OLAP framework over distributed XML dataAlfredo Cuzzocrea, Elisa Bertino

    Privacy Preserving Distributed OLAP is becoming a critical challenge for next-generation Business Intelligence (BI) scenarios, due to the "natural suitability" of OLAP in analyzing distributed massive BI repositories in a multidimensional and multigranularity manner. In particular, in these scenarios XML-formatted BI repositories play a dominant role, due to the wellknow amenities of XML in modeling and representing distributed business data. However, while Privacy Preserving Distributed Data Mining has been widely investigated, very few efforts have focused on the problem of effectively and efficiently supporting privacy preserving OLAP over distributed collections of XML documents. In order to fulfill this gap, we propose a novel Secure Multiparty Computation (SMC)-based privacy preserving OLAP framework for distributed collections of XML documents. The framework has many novel features ranging from nice theoretical properties to an effective and efficient protocol. The efficiency of our approach has been validated by an experimental evaluation over distributed collections of synthetic XML documents.

    View the Full Publication
  • 01/01/2010A Two-phase framework for quality-aware Web service selectionQi Yu, Manjeet Rege, Athman Bouguettaya, Brahim Medjahed

    Service-oriented computing is gaining momentum as the next technological tool to leverage the huge investments in Web application development. The expected large number of Web services poses a set of new challenges for efficiently accessing these services. We propose an integrated service query framework that facilitates users in accessing their desired services. The framework incorporates a service query model and a two-phase optimization strategy. The query model defines service communities that are used to organize the large and heterogeneous service space. The service communities allow users to use declarative queries to retrieve their desired services without worrying about the underlying technical details. The two-phase optimization strategy automatically generates feasible service execution plans and selects the plan with the best user-desired quality. In particular, we present an evolutionary algorithm that is able to “co-evolve” multiple feasible execution plans simultaneously and allows them to compete with each other to generate the best plan. We conduct a set of experiments to assess the performance of the proposed algorithms.

    View the Full Publication
  • 01/01/2010Assuring Data Trustworthiness - Concepts and Research ChallengesElisa Bertino, Hyo-Sang Lim

    Today, more than ever, there is a critical need to share data within and across organizations so that analysts and decision makers can analyze and mine the data, and make effective decisions. However, in order for analysts and decision makers to produce accurate analysis and make effective decisions and take actions, data must be trustworthy. Therefore, it is critical that data trustworthiness issues, which also include data quality, provenance and lineage, be investigated for organizational data sharing, situation assessment, multi-sensor data integration and numerous other functions to support decision makers and analysts. The problem of providing trustworthy data to users is an inherently difficult problem that requires articulated solutions combining different methods and techniques. In the paper we first elaborate on the data trustworthiness challenge and discuss a framework to address this challenge. We then present an initial approach for assess the trustworthiness of streaming data and discuss open research directions.

    View the Full Publication
  • 01/01/2010Biometrics-based identifiers for digital identity managementAbhilasha Bhargav-Spantzel, Anna Squicciarini, Elisa Bertino, Xiangwei Kong

    We present algorithms to reliably generate biometric identifiers from a user's biometric image which in turn is used for identity verification possibly in conjunction with cryptographic keys. The biometric identifier generation algorithms employ image hashing functions using singular value decomposition and support vector classification techniques. Our algorithms capture generic biometric features that ensure unique and repeatable biometric identifiers. We provide an empirical evaluation of our techniques using 2569 images of 488 different individuals for three types of biometric images; namely fingerprint, iris and face. Based on the biometric type and the classification models, as a result of the empirical evaluation we can generate biometric identifiers ranging from 64 bits up to 214 bits. We provide an example use of the biometric identifiers in privacy preserving multi-factor identity verification based on zero knowledge proofs. Therefore several identity verification factors, including various traditional identity attributes, can be used in conjunction with one or more biometrics of the individual to provide strong identity verification. We also ensure security and privacy of the biometric data. More specifically, we analyze several attack scenarios. We assure privacy of the biometric using the one-way hashing property, in that no information about the original biometric image is revealed from the biometric identifier.

    View the Full Publication
  • 01/01/2010Controlling data disclosure in computational PIR protocolsNing Shang, Gabriel Ghinita, Yongbin Zhou, Elisa Bertino

    Private Information Retrieval (PIR) protocols allow users to learn data items stored at a server which is not fully trusted, without disclosing to the server the particular data element retrieved. Several PIR protocols have been proposed, which provide strong guarantees on user privacy. Nevertheless, in many application scenarios it is important to protect the database as well. In this paper, we investigate the amount of data disclosed by the the most prominent PIR protocols during a single run. We show that a malicious user can stage attacks that allow an excessive amount of data to be retrieved from the server. Furthermore, this vulnerability can be exploited even if the client follows the legitimate steps of the PIR protocol, hence the malicious request can not be detected and rejected by the server. We devise mechanisms that limit the PIR disclosure to a single data item.

    View the Full Publication
  • 01/01/2010Credibility-enhanced curated database: Improving the value of curated databasesQun Ni, Elisa Bertino

    In curated databases, annotations may contain opinions different from those in sources. Moreover, annotations may contradict each other and have uncertainty. Such situations result in a natural question: "Which opinion is most likely to be correct?" In this paper, we define a credibility-enhanced curated database and propose an efficient method to accurately evaluate the correctness of sources and annotations in curated databases.

    View the Full Publication
  • 01/01/2010Efficient and privacy-preserving enforcement of attribute-based access controlNing Shang, Federica Paci, Elisa Bertino

    Modern access control models, developed for protecting data from accesses across the Internet, require to verify the identity of users in order to make sure that users have the required permissions for accessing the data. User's identity consists of data, referred to as identity attributes, that encode relevant-security properties of the users. Because identity attributes often convey sensitive information about users, they have to be protected. The Oblivious Commitment-Based Envelope (OCBE) protocols address the protection requirements of both users and service providers. The OCBE protocols makes it possible for a party, referred as sender, to send an encrypted message to a receiver such that the receiver can open the message if and only if its committed value satisfies a predicate and that the sender does not learn anything about the receiver's committed value. The possible predicates are comparison predicates =, ≠, >, <, ≤, ≥. In this paper, we present an extension that improves the efficiency of EQ-OCBE protocol, that is, the OCBE protocol for equality predicates. Our extension allows a party to decrypt data sent by a service provider if and only if the party satisfies all the equality conditions in the access control policy.

    View the Full Publication
  • 01/01/2010Enabling search services on outsourced private spatial dataMan Lung Yiu, Gabriel Ghinita, Christian Jensen, Panos Kalnis

    Cloud computing services enable organizations and individuals to outsource the management of their data to a service provider in order to save on hardware investments and reduce maintenance costs. Only authorized users are allowed to access the data. Nobody else, including the service provider, should be able to view the data. For instance, a real-estate company that owns a large database of properties wants to allow its paying customers to query for houses according to location. On the other hand, the untrusted service provider should not be able to learn the property locations and, e.g., selling the information to a competitor. To tackle the problem, we propose to transform the location datasets before uploading them to the service provider. The paper develops a spatial transformation that re-distributes the locations in space, and it also proposes a cryptographic-based transformation. The data owner selects the transformation key and shares it with authorized users. Without the key, it is infeasible to reconstruct the original data points from the transformed points. The proposed transformations present distinct trade-offs between query efficiency and data confidentiality. In addition, we describe attack models for studying the security properties of the transformations. Empirical studies demonstrate that the proposed methods are efficient and applicable in practice.

    View the Full Publication
  • 01/01/2010Enforcing spatial constraints for mobile RBAC systemsMichael Kirkpatrick, Elisa Bertino

    Proposed models for spatially-aware extensions of role-based access control (RBAC) combine the administrative and security advantages of RBAC with the dynamic nature of mobile and pervasive computing systems. However, implementing systems that enforce these models poses a number of challenges. As a solution, we propose an architecture for designing such a system. The architecture is based on an enhanced RBAC model that supports location-based access control policies by incorporating spatial constraints. Enforcing spatially-aware RBAC policies in a mobile environment requires addressing several challenges. First, one must guarantee the integrity of a user's location during an access request. We adopt a proximity-based solution using Near-Field Communication (NFC) technology. The next challenge is to verify the user's position continuously satisfies the location constraints. To capture these policy restrictions, we incorporate elements of the UCON_ABC usage control model in our architecture. In this work, we also propose a number of protocols, describe our prototype implementation, report the performance of our prototype, and evaluate the security guarantees.

    View the Full Publication
  • 01/01/2010EXAM: a comprehensive environment for the analysis of access control policiesDan Lin, Prathima Rao, Elisa Bertino, Ninghui Li

    As distributed collaborative applications and architectures are adopting policy-based solutions for tasks such as access control, network security and data privacy, the management and consolidation of a large number of policies is becoming a crucial component of such solutions. In large-scale distributed collaborative applications like web services, there is need for analyzing policy interaction and performing policy integration. In this demonstration, we present EXAM, a comprehensive environment for policy analysis and management, which can be used to perform a variety of functions such as policy property analyses, policy similarity analysis, policy integration.Our work focuses on analysis of access control policies written in XACML (Extensible Access Control Markup Language). We consider XACML policies because XACML is a rich language able to represent many policies of interest to real world applications and is gaining widespread adoption in the industry.

    View the Full Publication
  • 01/01/2010FENCE: Continuous access control enforcement in dynamic data stream environmentsRimma Nehme, Hyo-Sang Lim, Elisa Bertino

    In this paper, we present FENCE framework that addresses the problem of continuous access control enforcement in dynamic data stream environments. The distinguishing characteristics of FENCE include: (1) the stream-centric approach to security, (2) the symmetric modeling of security for both continuous queries and streaming data, and (3) security-aware query processing that considers both regular and security-related selectivities. In FENCE, both data and query security restrictions are modeled in the form of streaming security metadata, called "security punctuations", embedded inside data streams. We have implemented FENCE in a prototype DSMS and briefly summarize our performance observations.

    View the Full Publication
  • 01/01/2010GDR: a system for guided data repairMohamed Yakout, Ahmed Elmagarmid, Jennifer Neville, Mourad Ouzzani

    Improving data quality is a time-consuming, labor-intensive and often domain specific operation. Existing data repair approaches are either fully automated or not efficient in interactively involving the users. We present a demo of GDR, a Guided Data Repair system that uses a novel approach to efficiently involve the user alongside automatic data repair techniques to reach better data quality as quickly as possible. Specifically, GDR generates data repairs and acquire feedback on them that would be most beneficial in improving the data quality. GDR quantifies the data quality benefit of generated repairs by combining mechanisms from decision theory and active learning. Based on these benefit scores, groups of repairs are ranked and displayed to the user. User feedback is used to train a machine learning component to eventually replace the user in deciding on the validity of a suggested repair. We describe how the generated repairs are ranked and displayed to the user in a "useful-looking" way and demonstrate how data quality can be effectively improved with minimal feedback from the user.

    View the Full Publication
  • 01/01/2010Group-Based Negotiations in P2P SystemsAnna Squicciarini, Federica Paci, Elisa Bertino, Alberto Trombetta

    In P2P systems, groups are typically formed to share resources and/or to carry on joint tasks. In distributed environments formed by a large number of peers conventional authentication techniques are inadequate for the group joining process, and more advanced ones are needed. Complex transactions among peers may require more elaborate interactions based on what peers can do or possess instead of peers' identity. In this work, we propose a novel peer group joining protocol. We introduce a highly expressive resource negotiation language, able to support the specification of a large variety of conditions applying to single peers or groups of peers. Moreover, we define protocols to test such resource availability customized to the level of assurance required by the peers. Our approach has been tested and evaluated on an extension of the JXTA P2P platform. Our results show the robustness of our approach in detecting malicious peers, detected both during the negotiation and during the peer group lifetime. Regardless of the peer group cardinality and interaction frequency, the peers always detect possible free riders within a small time frame.

    View the Full Publication
  • 01/01/2010Guest Editors' Introduction: Data Quality in the Internet EraElisa Bertino, Andrea Maurino, Monica Scannapieco

    The vast amount of data available on the Internet introduces new challenging data quality problems, such as accessibility and usability. Low information quality is common in various Web applications, including Web 2.0 tools. Consequently, information quality on the Internet is one of the most crucial requirements for an effective use of data from the Web and pervasive deployment of Web-based applications.

    View the Full Publication
  • 01/01/2010How to authenticate graphs without leakingAshish Kundu, Elisa Bertino

    Secure data sharing in multi-party environments requires that both authenticity and confidentiality of the data be assured. Digital signature schemes are commonly employed for authentication of data. However, no such technique exists for directed graphs, even though such graphs are one of the most widely used data organization structures. Existing schemes for DAGs are authenticity-preserving but not confidentiality-preserving, and lead to leakage of sensitive information during authentication. In this paper, we propose two schemes on how to authenticate DAGs and directed cyclic graphs without leaking, which are the first such schemes in the literature. It is based on the structure of the graph as defined by depth-first graph traversals and aggregate signatures. Graphs are structurally different from trees in that they have four types of edges: tree, forward, cross, and back-edges in a depth-first traversal. The fact that an edge is a forward, cross or a back-edge conveys information that is sensitive in several contexts. Moreover, back-edges pose a more difficult problem than the one posed by forward, and cross-edges primarily because back-edges add bidirectional properties to graphs. We prove that the proposed technique is both authenticity-preserving and non-leaking. While providing such strong security properties, our scheme is also efficient, as supported by the performance results.

    View the Full Publication
  • 01/01/2010Mask: a system for privacy-preserving policy-based access to published contentMohamed Nabeel, Ning Shang, John Zage, Elisa Bertino

    We propose to demonstrate Mask, the first system addressing the seemingly-unsolvable problem of how to selectively share contents among a group of users based on access control policies expressed as conditions against the identity attributes of these users while at the same time assuring the privacy of these identity attributes from the content publisher. Mask consists of three entities: a Content Publisher, Users referred to as Subscribers, and Identity Providers that issue certified identity attributes. The content publisher specifies access control policies against identity attributes of subscribers indicating which conditions the identity attributes of a subscriber must verify in order for this subscriber to access a document or a subdocument. The main novelty of Mask is that, even though the publisher is able to match the identity attributes of the subscribers against its own access control policies, the publisher does not learn the values of the identity attributes of the subscribers; the privacy of the authorized subscribers is thus preserved. Based on the specified access control policies, documents are divided into subdocuments and the subdocuments having different access control policies are encrypted with different keys. Subscribers derive the keys corresponding to the subdocuments they are authorized to access. Key distribution in Mask is supported by a novel group key management protocol by which subscribers can reconstruct the decryption keys from the subscription information they receive from the publisher. The publisher however does not learn which decryption keys each subscriber is able to reconstruct. In this demonstration, we show our system using a healthcare scenario.

    View the Full Publication
  • 01/01/2010Policy-Driven Service Composition with Information Flow ControlWei She, I-Ling Yen, Bhavani Thuraisingham, Elisa Bertino

    Ensuring secure information flow is a critical task for service composition in multi-domain systems. Research in security-aware service composition provides some preliminary solutions to this problem, but there are still issues to be addressed. In this paper, we develop a service composition mechanism specifically focusing on the secure information flow control issues. We first introduce a general model for information flow control in service chains, considering the transformation factors of services and security classes of data resources in a service chain. Then, we develop general rules to guide service composition satisfying secure information flow requirements. Finally, to achieve efficient service composition, we develop a three-phase protocol to allow rapid filtering of candidate compositions that are unlikely to satisfy the information flow constraints and thorough evaluation of highly promising candidates. Our approach can achieve effective and efficient service composition considering secure information flow.

    View the Full Publication
  • 01/01/2010Preserving privacy and fairness in peer-to-peer data integrationHazem Elmeleegy, Mourad Ouzzani, Ahmed Elmagarmid, Ahmad Abusalah

    Peer-to-peer data integration - a.k.a. Peer Data Management Systems (PDMSs) - promises to extend the classical data integration approach to the Internet scale. Unfortunately, some challenges remain before realizing this promise. One of the biggest challenges is preserving the privacy of the exchanged data while passing through several intermediate peers. Another challenge is protecting the mappings used for data translation. Protecting the privacy without being unfair to any of the peers is yet a third challenge. This paper presents a novel query answering protocol in PDMSs to address these challenges. The protocol employs a technique based on noise selection and insertion to protect the query results, and a commutative encryption-based technique to protect the mappings and ensure fairness among peers. An extensive security analysis of the protocol shows that it is resilient to several possible types of attacks. We implemented the protocol within an established PDMS: the Hyperion system. We conducted an experimental study using real data from the healthcare domain. The results show that our protocol manages to achieve its privacy and fairness goals, while maintaining query processing time at the interactive level.

    View the Full Publication
  • 01/01/2010Privacy-Aware Location-Aided Routing in Mobile Ad Hoc NetworksGabriel Ghinita, Mehdi Azarmi, Elisa Bertino

    Mobile Ad-hoc Networks (MANETs) enable users in physical proximity to each other to exchange data without the need for expensive communication infrastructures. Each user represents a node in the network, and executes a neighbor discovery Typically, nodes broadcast beacon messages that are received by other participants within the sender’s communication range. Routing strategies are computed on-line based on the locations of nearby nodes, and geocasting is employed to deliver data packets to their destinations. However, mobile users may be reluctant to share their exact locations with other participants, since location can disclose private details about a person’s lifestyle, religious or political affiliations, etc. A common approach to protect location privacy is to replace exact coordinates with coarser-grained regions, based on the privacy profile of each user. In this paper, we investigate protocols that support MANET routing without disclosing exact positions of nodes. Each node defines its own privacy profile, and reports a cloaked location information to its neighbors. We adopt a novel strategy to advertise beacons, to prevent inference of node locations. We also propose packet forwarding heuristics that rely on cloaking regions, rather than point locations. Our extensive experimental evaluation shows that the proposed routing scheme achieves low delays and high packet delivery ratios, without incurring significant overhead compared to conventional MANET routing protocols.

    View the Full Publication
  • 01/01/2010Private record matching using differential privacyAli Inan, Murat Kantarcioglu, Gabriel Ghinita, Elisa Bertino

    Private matching between datasets owned by distinct parties is a challenging problem with several applications. Private matching allows two parties to identify the records that are close to each other according to some distance functions, such that no additional information other than the join result is disclosed to any party. Private matching can be solved securely and accurately using secure multi-party computation (SMC) techniques, but such an approach is prohibitively expensive in practice. Previous work proposed the release of sanitized versions of the sensitive datasets which allows blocking, i.e., filtering out sub-sets of records that cannot be part of the join result. This way, SMC is applied only to a small fraction of record pairs, reducing the matching cost to acceptable levels. The blocking step is essential for the privacy, accuracy and efficiency of matching. However, the state-of-the-art focuses on sanitization based on k-anonymity, which does not provide sufficient privacy. We propose an alternative design centered on differential privacy, a novel paradigm that provides strong privacy guarantees. The realization of the new model presents difficult challenges, such as the evaluation of distance-based matching conditions with the help of only a statistical queries interface. Specialized versions of data indexing structures (e.g., kd-trees) also need to be devised, in order to comply with differential privacy. Experiments conducted on the real-world Census-income dataset show that, although our methods provide strong privacy, their effectiveness in reducing matching cost is not far from that of k-anonymity based counterparts

    View the Full Publication
  • 01/01/2010Privilege States Based Access Control for Fine-Grained Intrusion ResponseAshish Kamra, Elisa Bertino

    We propose an access control model specifically developed to support fine-grained response actions, such as request suspension and request tainting, in the context of an anomaly detection system for databases. To achieve such response semantics, the model introduces the concept of privilege states and orientation modes in the context of a role-based access control system. The central idea in our model is that privileges, assigned to a user or role, have a state attached to them, thereby resulting in a privilege states based access control (PSAC) system. In this paper, we present the design details and a formal model of PSAC tailored to database management systems (DBMSs). PSAC has been designed to also take into account role hierarchies that are often present in the access control models of current DBMSs. We have implemented PSAC in the PostgreSQL DBMS and in the paper, we discuss relevant implementation issues. We also report experimental results concerning the overhead of the access control enforcement in PSAC. Such results confirm that our design and algorithms are very efficient.

    View the Full Publication
  • 01/01/2010Privometer: Privacy protection in social networksNilothpal Talukder, Mourad Ouzzani, Ahmed Elmagarmid, Hazem Elmeleegy

    The increasing popularity of social networks, such as Facebook and Orkut, has raised several privacy concerns. Traditional ways of safeguarding privacy of personal information by hiding sensitive attributes are no longer adequate. Research shows that probabilistic classification techniques can effectively infer such private information. The disclosed sensitive information of friends, group affiliations and even participation in activities, such as tagging and commenting, are considered background knowledge in this process. In this paper, we present a privacy protection tool, called Privometer, that measures the amount of sensitive information leakage in a user profile and suggests selfsanitization actions to regulate the amount of leakage. In contrast to previous research, where inference techniques use publicly available profile information, we consider an augmented model where a potentially malicious application installed in the user’s friend profiles can access substantially more information. In our model, merely hiding the sensitive information is not sufficient to protect the user privacy. We present an implementation of Privometer in Facebook.

    View the Full Publication
  • 01/01/2010Provenance-based trustworthiness assessment in sensor networksHyo-Sang Lim, Yang-Sae Moon, Elisa Bertino

    As sensor networks are being increasingly deployed in decision-making infrastructures such as battlefield monitoring systems and SCADA (Supervisory Control and Data Acquisition) systems, making decision makers aware of the trustworthiness of the collected data is a crucial. To address this problem, we propose a systematic method for assessing the trustworthiness of data items. Our approach uses the data provenance as well as their values in computing trust scores, that is, quantitative measures of trustworthiness. To obtain trust scores, we propose a cyclic framework which well reflects the inter-dependency property: the trust score of the data affects the trust score of the network nodes that created and manipulated the data, and vice-versa. The trust scores of data items are computed from their value similarity and provenance similarity. The value similarity comes from the principle that "the more similar values for the same event, the higher the trust scores". The provenance similarity is based on the principle that "the more different data provenances with similar values, the higher the trust scores". Experimental results show that our approach provides a practical solution for trustworthiness assessment in sensor networks.

    View the Full Publication
  • 01/01/2010Publishing Time-Series Data under Preservation of Privacy and Distance OrdersYang-Sae Moon, Hea-Suk Kim, Sang-Pil Kim, Elisa Bertino

    In this paper we address the problem of preserving mining accuracy as well as privacy in publishing sensitive time-series data. For example, people with heart disease do not want to disclose their electrocardiogram time-series, but they still allow mining of some accurate patterns from their time-series. Based on this observation, we introduce the related assumptions and requirements.We show that only randomization methods satisfy all assumptions, but even those methods do not satisfy the requirements. Thus, we discuss the randomization-based solutions that satisfy all assumptions and requirements. For this purpose, we use the noise averaging effect of piecewise aggregate approximation (PAA), which may alleviate the problem of destroying distance orders in randomly perturbed time-series. Based on the noise averaging effect, we first propose two naive solutions that use the random data perturbation in publishing time-series while exploiting the PAA distance in computing distances. There is, however, a tradeoff between these two solutions with respect to uncertainty and distance orders. We thus propose two more advanced solutions that take advantages of both naive solutions. Experimental results show that our advanced solutions are superior to the naive solutions.

    View the Full Publication
  • 01/01/2010Risk-based access control systems built on fuzzy inferencesQun Ni, Elisa Bertino, Jorge Lobo

    Fuzzy inference is a promising approach to implement risk-based access control systems. However, its application to access control raises some novel problems that have not been yet investigated. First, because there are many different fuzzy operations, one must choose the fuzzy operations that best address security requirements. Second, risk-based access control, though it improves information flow and better addresses requirements from critical organizations, may result in damages by malicious users before mitigating steps are taken. Third, the scalability of a fuzzy inference-based access control system is questionable. The time required by a fuzzy inference engine to estimate risks may be quite high especially when there are tens of parameters and hundreds of fuzzy rules. However, an access control system may need to serve hundreds or thousands of users. In this paper, we investigate these issues and present our solutions or answers to them.

    View the Full Publication
  • 01/01/2010Automatically Tuning Parallel and Parallelized ProgramsChirag Dave, Rudolf Eigenmann

    In today’s multicore era, parallelization of serial code is essential in order to exploit the architectures performance potential. Parallelization, especially of legacy code, however, proves to be a challenge as manual efforts must either be directed towards algorithmic modifications or towards analysis of computationally intensive sections of code for the best possible parallel performance, both of which are difficult and time-consuming. Automatic parallelization uses sophisticated compile-time techniques in order to identify parallelism in serial programs, thus reducing the burden on the program developer. Similar sophistication is needed to improve the performance of hand-parallelized programs. A key difficulty is that optimizing compilers are generally unable to estimate the performance of an application or even a program section at compile time, and so the task of performance improvement invariably rests with the developer. Automatic tuning uses static analysis and runtime performance metrics to determine the best possible compile-time approach for optimal application performance. This paper describes an offline tuning approach that uses a source-to-source parallelizing compiler, Cetus, and a tuning framework to tune parallel application performance. The implementation uses an existing, generic tuning algorithm called Combined Elimination to study the effect of serializing parallelizable loops based on measured whole program execution time, and provides a combination of parallel loops as an outcome that ensures to equal or improve performance of the original program. We evaluated our algorithm on a suite of hand-parallelized C benchmarks from the SPEC OMP2001 and NAS Parallel benchmarks and provide two sets of results. The first ignores hand-parallelized loops and only tunes application performance based on Cetus-parallelized loops. The second set of results considers the tuning of additional parallelism in hand-parallelized code. We show that our implementation always performs near-equal or better than serial code while tuning only Cetus-parallelized loops and equal to or better than hand-parallelized code while tuning additional parallelism.

    View the Full Publication
  • 01/01/2010Combining Evidence with a Probabilistic Framework for Answer Ranking and Answer Merging in Question AnsweringJeongwoo Ko, Luo Si, Eric Nyberg

    Question answering (QA) aims at finding exact answers to a user’s question from a large collection of documents. Most QA systems combine information retrieval with extraction techniques to identify a set of likely candidates and then utilize some ranking strategy to generate the final answers. This ranking process can be challenging, as it entails identifying the relevant answers amongst many irrelevant ones. This is more challenging in multi-strategy QA, in which multiple answering agents are used to extract answer candidates. As answer candidates come from different agents with different score distributions, how to merge answer candidates plays an important role in answer ranking. In this paper, we propose a unified probabilistic framework which combines multiple evidence to address challenges in answer ranking and answer merging. The hypotheses of the paper are that: (1) the framework effectively combines multiple evidence for identifying answer relevance and their correlation in answer ranking, (2) the framework supports answer merging on answer candidates returned by multiple extraction techniques, (3) the framework can support list questions as well as factoid questions, (4) the framework can be easily applied to a different QA system, and (5) the framework significantly improves performance of a QA system. An extensive set of experiments was done to support our hypotheses and demonstrate the effectiveness of the framework. All of the work substantially extends the preliminary research in Ko et al., 2007a Ko, J., Si, L., & Nyberg, E. (2007a). A probabilistic framework for answer selection in question answering. In: Proceedings of NAACL/HLT.. Ko et al. (2007a). A probabilistic framework for answer selection in question answering. In: Proceedings of NAACL/HLT.

    View the Full Publication
  • 01/01/2010Detecting Students’ Off-Task Behavior in Intelligent Tutoring Systems with Machine Learning TechniquesSuleyman Cetintas, Luo Si, Yan Ping Xin, Casey Hord

    Identifying off-task behaviors in intelligent tutoring systems is a practical and challenging research topic. This paper proposes a machine learning model that can automatically detect students' off-task behaviors. The proposed model only utilizes the data available from the log files that record students' actions within the system. The model utilizes a set of time features, performance features, and mouse movement features, and is compared to 1) a model that only utilizes time features and 2) a model that uses time and performance features. Different students have different types of behaviors; therefore, personalized version of the proposed model is constructed and compared to the corresponding nonpersonalized version. In order to address data sparseness problem, a robust Ridge Regression algorithm is utilized to estimate model parameters. An extensive set of experiment results demonstrates the power of using multiple types of evidence, the personalized model, and the robust Ridge Regression algorithm.

    View the Full Publication
  • 01/01/2010Purdue at TREC 2010 Entity Track: A Probabilistic Framework for Matching Types Between Candidate and Target EntitiesYi Fang, Luo Si, Aditya P. Mathur

    Generative models such as statistical language modeling have been widely studied in the task of expert search to model the relationship between experts and their expertise indicated in supporting documents. On the other hand, discriminative models have received little attention in expert search research, although they have been shown to outperform generative models in many other information retrieval and machine learning applications. In this paper, we propose a principled relevance-based discriminative learning framework for expert search and derive specific discriminative models from the framework. Compared with the state-of-the-art language models for expert search, the proposed research can naturally integrate various document evidence and document-candidate associations into a single model without extra modeling assumptions or effort. An extensive set of experiments have been conducted on two TREC Enterprise track corpora (i.e., W3C and CERC) to demonstrate the effectiveness and robustness of the proposed framework.

    View the Full Publication
  • 01/01/2010A Joint Probabilistic Classification Model for Resource SelectionDzung Hong, Luo Si, Paul Bracke, Michael Witt, Tim Juchcinski

    Resource selection is an important task in Federated Search to select a small number of most relevant information sources. Current resource selection algorithms such as GlOSS, CORI, ReDDE, Geometric Average and the recent classification-based method focus on the evidence of individual information sources to determine the relevance of available sources. Current algorithms do not model the important relationship information among individual sources. For example, an information source tends to be relevant to a user query if it is similar to another source with high probability of being relevant. This paper proposes a joint probabilistic classification model for resource selection. The model estimates the probability of relevance of information sources in a joint manner by considering both the evidence of individual sources and their relationship. An extensive set of experiments have been conducted on several datasets to demonstrate the advantage of the proposed model.

    View the Full Publication
  • 01/01/2010Predicting Correctness of Problem Solving in ITS with a Temporal Collaborative Filtering ApproachSuleyman Cetintas, Luo Si, Yan Ping Xin, Casey Hord

    Collaborative filtering (CF) is a technique that utilizes how users are associated with items in a target application and predicts the utility of items for a particular user. Temporal collaborative filtering (temporal CF) is a time-sensitive CF approach that considers the change in user-item interactions over time. Despite its capability to deal with dynamic educational applications with rapidly changing user-item interactions, there is no prior research of temporal CF on educational tasks. This paper proposes a temporal CF approach to automatically predict the correctness of students’ problem solving in an intelligent math tutoring system. Unlike traditional user-item interactions, a student may work on the same problem multiple times, and there are usually multiple interactions for a student-problem pair. The proposed temporal CF approach effectively utilizes information coming from multiple interactions and is compared to i) a traditional CF approach, ii) a temporal CF approach that uses a sliding-time-window but ignores old data and multiple interactions and iii) a combined temporal CF approach that uses a sliding-time-window together with multiple interactions. An extensive set of experiment results show that using multiple-interactions significantly improves the prediction accuracy while using sliding-time-windows doesn’t make a significant difference.

    View the Full Publication
  • 01/01/2010Learning to Identify Students’ Relevant and Irrelevant Questions in a Micro-blogging Supported ClassroomSuleyman Cetintas, Luo Si, Sugato Chakravarty, Hans Aagard, Kyle Bowen

    This paper proposes a novel application of text categorization for two types questions asked in a micro-blogging supported classroom, namely relevant and irrelevant questions. Empirical results and analysis show that utilizing the correlation between questions and available lecture materials in a lecture along with personalization and question text leads to significantly higher categorization accuracy than i) using personalization along with question text and ii) using question text alone.

    View the Full Publication
  • 12/01/2009Casper: Query processing for location services without compromising privacyChin-Yin Chow, Mohamed Mokbel, Walid G. Aref

    In this article, we present a new privacy-aware query processing framework, Capser, in which mobile and stationary users can obtain snapshot and/or continuous location-based services without revealing their private location information. In particular, we propose a privacy-aware query processor embedded inside a location-based database server to deal with snapshot and continuous queries based on the knowledge of the user's cloaked location rather than the exact location. Our proposed privacy-aware query processor is completely independent of how we compute the user's cloaked location. In other words, any existing location anonymization algorithms that blur the user's private location into cloaked rectilinear areas can be employed to protect the user's location privacy. We first propose a privacy-aware query processor that not only supports three new privacy-aware query types, but also achieves a trade-off between query processing cost and answer optimality. Then, to improve system scalability of processing continuous privacy-aware queries, we propose a shared execution paradigm that shares query processing among a large number of continuous queries. The proposed scalable paradigm can be tuned through two parameters to trade off between system scalability and answer optimality. Experimental results show that our query processor achieves high quality snapshot and continuous location-based services while supporting queries and/or data with cloaked locations.

    View the Full Publication
  • 11/03/2009The RUM-tree: supporting frequent updates in R-trees using memosYasin Silva, Xiaopeng Xiong, Walid G. Aref

    The problem of frequently updating multi-dimensional indexes arises in many location-dependent applications. While the R-tree and its variants are the dominant choices for indexing multi-dimensional objects, the R-tree exhibits inferior performance in the presence of frequent updates. In this paper, we present an R-tree variant, termed the RUM-tree (which stands for R-tree with update memo) that reduces the cost of object updates. The RUM-tree processes updates in a memo-based approach that avoids disk accesses for purging old entries during an update process. Therefore, the cost of an update operation in the RUM-tree is reduced to the cost of only an insert operation. The removal of old object entries is carried out by a garbage cleaner inside the RUM-tree. In this paper, we present the details of the RUM-tree and study its properties. We also address the issues of crash recovery and concurrency control for the RUM-tree. Theoretical analysis and comprehensive experimental evaluation demonstrate that the RUM-tree outperforms other R-tree variants by up to one order of magnitude in scenarios with frequent updates.

    View the Full Publication
  • 09/01/2009Deriving Customized Integrated Web Query InterfacesEduard Dragut, Fang Fang, Clement Yu, Weiyi Meng

    Given a set of query interfaces from providers in the same domain (e.g., car rental), the goal is to build automatically an integrated interface that makes the access to individual sources transparent to users. Our goal is to allow users to choose their preferred providers. Consequently, the integrated interface should reflect only the query interfaces of these sources. The problem scrutinized in this work is deriving customized integrated interfaces. On the hypothesis that query interfaces on the Web are easily understood by ordinary users (well-designed assumption), mainly because of the way their attributes are organized (structural property) and named (lexical property), we develop algorithms to construct customized integrated interfaces. Experiments are performed to validate our analytical studies, including a user survey.

    View the Full Publication
  • 09/01/2009An Overview of VeryIDX - A Privacy-Preserving Digital Identity Management System for Mobile DevicesFederica Paci, Elisa Bertino, Sam Kerr, Anna Squicciarini, Jungha Woo

    Users increasingly use their mobile devices to communicate, to conduct business transaction and access resources and services. In such a scenario, digital identity management (DIM) technology is fundamental in customizing user experience, protecting privacy, underpinning accountability in business transactions, and in complying with regulatory controls. Users identity consists of data, referred to as identity attributes, that encode relevant-security properties of the clients. However, identity attributes can be target of several attacks: the loss or theft of mobile devices results in a exposure of identity attributes; identity attributes that are send over WI-FI or 3G networks can be easily intercepted; identity attributes can also be captured via Bluetooth connections without the user’s consent; and mobile viruses, worms and Trojan horses can access the identity attributes stored on mobile devices if this information is not protected by passwords or PIN numbers. Therefore, assuring privacy and security of identity attributes, as well as of any sensitive information stored on mobile devices is crucial. In this paper we address such problems by proposing an approach to manage user identity attributes by assuring their privacypreserving usage. The approach is based on the concept of privacy preserving multi-factor authentication achieved by a new cryptographic primitive which uses aggregate signatures on commitments that are then used for aggregate zero-knowledge proof of knowledge (ZKPK) protocols. We present the implementation of such approach on Nokia NFC cellular phones and report performance evaluation results.

    View the Full Publication
  • 08/24/2009Online Piece-wise Linear Approximation of Numerical Streams with Precision GuaranteesHazem Elmeleegy, Ahmed Elmagarmid, Emmanuel Cecchet, Walid G. Aref, Willy Zwaenepoel

    Continuous “always-on” monitoring is beneficial for a number of applications, but potentially imposes a high load in terms of communication, storage and power consumption when a large number of variables need to be monitored. We introduce two new filtering techniques, swing filters and slide filters, that represent within a prescribed precision a time-varying numerical signal by a piecewise linear function, consisting of connected line segments for swing filters and (mostly) disconnected line segments for slide filters. We demonstrate the effectiveness of swing and slide filters in terms of their compression power by applying them to a reallife data set plus a variety of synthetic data sets. For nearly all combinations of signal behavior and precision requirements, the proposed techniques outperform the earlier approaches for online filtering in terms of data reduction. The slide filter, in particular, consistently dominates all other filters, with up to twofold improvement over the best of the previous techniques.

    View the Full Publication
  • 08/13/2009Diversity and Strain Specificity of Plant Cell Wall Degrading Enzymes Revealed by the Draft Genome of Ruminococcus flavefaciens FD-1Margret E. Berg Miller, Dionysios A. Antonopoulos, Mark Brand, Albert Bari, Alvaro Hernandez, Jyothi Thimmapuram, Bryan A. White, Marco Rincon, Harry J. Flint, Bernard Henrissat, Pedro M. Coutinho

    Ruminococcus flavefaciens is a predominant cellulolytic rumen bacterium, which forms a multi-enzyme cellulosome complex that could play an integral role in the ability of this bacterium to degrade plant cell wall polysaccharides. Identifying the major enzyme types involved in plant cell wall degradation is essential for gaining a better understanding of the cellulolytic capabilities of this organism as well as highlighting potential enzymes for application in improvement of livestock nutrition and for conversion of cellulosic biomass to liquid fuels.

    View the Full Publication
  • 07/01/2009Guest Editorial: Special Section on Service-Oriented Distributed Computing SystemsElisa Bertino, William Cheng-Chung Chu

    Guest Editorial: Special Section on Service-Oriented Distributed Computing Systems

    View the Full Publication
  • 07/01/2009Visualization for Access Control Policy Analysis Results Using Multi-level GridsPrathima Rao, Gabriel Ghinita, Elisa Bertino, Jorge Lobo

    The rapid increase in deployment of policy-based access control systems faces security administrators with the daunting task of managing a large number of complex access control policies. Several policy analysis types, (e.g., policy similarity, policy conflict and change-impact) have been proposed to help administrators maintain consistent and conflict-free policy repositories. However, there has not been much focus on the presentation and the ensuing interpretation of the results of such analyses, which greatly undermines the usability factor. In this paper, we present a novel multi-level grid-based technique for visualizing results of policy analysis. We implemented this technique, and we present a sample policy similarity analysis scenario that highlights the advantages of the proposed result visualization method.

    View the Full Publication
  • 07/01/2009Privacy-preserving techniques for location-based servicesElisa Bertino

    Recent advances in positioning techniques, small devices, GIS-based services, and ubiquitous connectivity, have enabled a large variety of location-based services able to tailor services according to the location of the individual requiring the service. Location information, however, if on one side is critical for providing customized services, on the other hand, if misused, can lead to privacy breaches. By cross-referencing location information about an individual with other information and by exploiting domain knowledge, an attacker may infer sensitive information about the individual, such as healthcare or financial information. To address such problems, different techniques have been proposed that are based on two main approaches: location cloaking, under which a suitable large region is returned to the service provider instead of the precise user location [1]; location k-anonymization, under which the location of an individual is returned to the service provider only if it is indistinguishable with respect to the location of other k-1 individuals [5, 6]. These techniques have, however, a major drawback in that they do not take into account domain knowledge, and are thus prone to location inference attacks [2]. Given a generalized location of an individual, obtained for example through location cloaking, such an attack exploits the knowledge about the semantics of spatial entities to infer bounds about the location of an individual that are more precise with respect to the generalized location. Another major drawback is that those approaches do not support personalized privacy preferences. We believe that supporting such preferences is crucial in that different individuals have different preferences with respect to which location are considered privacy-sensitive.

    View the Full Publication
  • 06/01/2009A Characterization of the problem of secure provenance managementShouhuai Xu, Qun Ni, Elisa Bertino, Ravi Sandhu

    Data (or information) provenance has many important applications. However, prior work on data provenance management almost exclusively focused on the collection, representation, query, and storage of provenance data. In contrast, the security aspect of provenance management has not been understood nor adequately addressed. A natural question then is: What would a secure provenance management system - perhaps as an analogy to secure database management systems - look like? In this paper, we explore the problem space of secure provenance management systems with an emphasis on the security requirements for such systems, and characterize desired solutions for tackling the problem. We believe that this paper makes a significant step towards a comprehensive solution to the problem of secure provenance management.

    View the Full Publication
  • 06/01/2009A framework for efficient data anonymization under privacy and accuracy constraintsGabriel Ghinita, Panagiotis Karras, Panos Kalnis, Nikos Mamoulis

    Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy-preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) l-diversification is solved by techniques developed for the simpler k-anonymization problem, causing unnecessary information loss. (ii) The anonymization process is inefficient in terms of computational and I/O cost. (iii) Previous research focused exclusively on the privacy-constrained problem and ignored the equally important accuracy-constrained (or dual) anonymization problem.

    In this article, we propose a framework for efficient anonymization of microdata that addresses these deficiencies. First, we focus on one-dimensional (i.e., single-attribute) quasi-identifiers, and study the properties of optimal solutions under the k-anonymity and l-diversity models for the privacy-constrained (i.e., direct) and the accuracy-constrained (i.e., dual) anonymization problems. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multidimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the existing approaches in terms of execution time and information loss.

    View the Full Publication
  • 05/01/2009An Interoperable Approach to Multifactor Identity VerificationFederica Paci, Rodolfo Ferrini, Andrea Musci, Kevin Steuer Jr

    Naming heterogeneity occurs in digital identity management systems when the various parties involved in managing digital identities use different vocabularies to denote identity attribute names. To resolve potential interoperability issues due to naming heterogeneity, the authors propose a new protocol that uses lookup tables, dictionaries, and ontology mapping techniques.

    View the Full Publication
  • 05/01/2009Interactive Location Cloaking with the PROBE ObfuscatorGabriel Ghinita, Maria Luisa Damiani, Elisa Bertino, Claudio Silvestri

    The problem of private location-based queries has been intensively researched in recent years. Several location protection algorithms exist, most of which use some form of location cloaking. However, existing work focuses on the analysis of privacy and performance, and less on the user's perspective on location privacy. We developed a prototype of the PROBE system with an emphasis on visualization of the location cloaking process, which improves user experience and increases privacy awareness.

    View the Full Publication
  • 05/01/2009Location-Aware Authentication and Access ControlElisa Bertino, Michael Kirkpatrick

    The paper first discusses motivations why taking into account location information in authentication and access control is important. The paper then surveys current approaches to location-aware authentication, including the notion of context-based flexible authentication policies, and to location-aware access control, with focus on the GEO-RBAC model. Throughout the discussion, the paper identifies open research directions.

    View the Full Publication
  • 04/10/2009Similarity Group-ByYasin Silva, Walid G. Aref, Mohamed Ali

    Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a general definition of the similarity group-by operation and gives three instances of this definition. The paper also discusses how optimization techniques for the regular group-by can be extended to the case of SGB. The proposed operators are implemented inside PostgreSQL. The performance study shows that the proposed similarity-based group-by operators have good scalability properties with at most only 25% increase in execution time over the regular group-by.

    View the Full Publication
  • 04/10/2009Hippocratic PostgreSQLJ Padma, Yasin Silva, Mohamed Arshad, Walid G. Aref

    Privacy preservation has become an important requirement in information systems that deal with personal data. In many cases this requirement is imposed by laws that recognize the right of data owners to control whom their information is shared with and the purposes for which it can be shared. Hippocratic databases have been proposed as an answer to this privacy requirement; they extend the architecture of standard DBMSs with components that ensure personal data is handled in compliance with its associated privacy definitions. Previous work in Hippocratic databases has proposed the design of some of these components. Unfortunately, there has not been much work done to implement these components as an integral part of a DBMS and study the problems faced to realize the Hippocratic databases. The main goal of the 'Hippocratic PostgreSQL' project is to perform this implementation and study. The project includes the implementation of components to support limited disclosure, limited retention time, and management of multiple policies and policy versions. This demo presents the use of these components both from a terminal-based SQL command interface and through a Web-based healthcare application that makes use of the implemented database-level privacy features. Hippocratic PostgreSQL has the novel feature of augmenting both k-anonymity and generalization hierarchies into the Hippocratic DBMS engine functionality. Several interesting problems emerge as a result and their solutions are presented in the context of this demo.

    View the Full Publication
  • 04/10/2009Chameleon: Context-Awareness inside DBMSsHicham Elmongui, Walid G. Aref, Mohamed Mokbel

    Context is any information used to characterize the situation of an entity. Examples of contexts include time, location, identity, and activity of a user. This paper proposes a general context-aware DBMS, named Chameleon, that will eliminate the need for having specialized database engines, e.g., spatial DBMS, temporal DBMS, and Hippocratic DBMS, since space, time, and identity can be treated as contexts in the general context-aware DBMS. In Chameleon, we can combine multiple contexts into more complex ones using the proposed context composition, e.g., a Hippocratic DBMS that also provides spatio-temporal and location contextual services. As a proof of concept, we construct two case studies using the same context-aware DBMS platform within Chameleon. One treats identity as a context to realize a privacy-aware (Hippocratic) database server, while the other treats space as a context to realize a spatial database server using the same proposed constructs and interfaces of Chameleon.

    View the Full Publication
  • 03/01/2009Efficient Private Record LinkageMohamed Yakout, Mikhail J. Atallah, Ahmed Elmagarmid

    Record linkage is the computation of the associations among records of multiple databases. It arises in contexts like the integration of such databases, online interactions and negotiations, and many others. The autonomous entities who wish to carry out the record matching computation are often reluctant to fully share their data. In such a framework where the entities are unwilling to share data with each other, the problem of carrying out the linkage computation without full data exchange has been called private record linkage. Previous private record linkage techniques have made use of a third party. We provide efficient techniques for private record linkage that improve on previous work in that (i) they make no use of a third party; (ii) they achieve much better performance than that of previous schemes in terms of execution time and quality of output (i.e., practically without false negatives and minimal false positives). Our software implementation provides experimental validation of our approach and the above claims.

    View the Full Publication
  • 02/28/2009Transcriptome pathways unique to dehydration tolerant relatives of modern wheat.N Z. Ergen, Jyothi Thimmapuram, H J. Bohnert, H Budak

    Among abiotic stressors, drought is a major factor responsible for dramatic yield loss in agriculture. In order to reveal differences in global expression profiles of drought tolerant and sensitive wild emmer wheat genotypes, a previously deployed shock-like dehydration process was utilized to compare transcriptomes at two time points in root and leaf tissues using the Affymetrix GeneChip(R) Wheat Genome Array hybridization. The comparison of transcriptomes reveal several unique genes or expression patterns such as differential usage of IP(3)-dependent signal transduction pathways, ethylene- and abscisic acid (ABA)-dependent signaling, and preferential or faster induction of ABA-dependent transcription factors by the tolerant genotype that distinguish contrasting genotypes indicative of distinctive stress response pathways. The data also show that wild emmer wheat is capable of engaging known drought stress responsive mechanisms. The global comparison of transcriptomes in the absence of and after dehydration underlined the gene networks especially in root tissues that may have been lost in the selection processes generating modern bread wheats.

    View the Full Publication
  • 01/01/2009Electronic Ink IndexingWalid G. Aref

    eAccessibility refers to the access of Information and Communication Technologies (ICT) by people with disabilities, with particular emphasis on the World WideWeb. It is the extent to which the use of an application or service is affected by the user’s particular functional limitations or abilities (permanent or temporary). eAccessibility can be considered as a fundamental prerequisite of usability.

    View the Full Publication
  • 01/01/2009Window-based Query ProcessingWalid G. Aref

    Data Streams are infinite in nature. As a result, a query that executes over data streams specifies a ‘‘window’’ of focus or the part of the data stream that is of interest to the query. When new data items arrive into the data stream, the window may either expand or slide to allow the query to process these new data items. Hence, queries over data streams are continuous in nature, i.e., the query is continuously re-evaluated each time the query window slides.Window-based query processing on data streams refers to the various ways and techniques for processing and evaluating continuous queries over windows of data stream items.

    View the Full Publication
  • 01/01/2009Space-Filling Curves for Query ProcessingMohamed Mokbel, Walid G. Aref

    Given a query Q, a one-dimensional index structure I(e.g., B-tree), and a set of D dimensional points, a space-filling curve S is used to map the D dimensional points into a set of one-dimensional points that can be indexed through I for an efficient execution of query Q. The main idea is that space-filling curves are used as a way of mapping the multi-dimensional space into the one-dimensional space such that existing onedimensional query processing and indexing techniques can be applied.

    View the Full Publication
  • 01/01/2009Indexing Historical Spatio-Temporal DataMohamed Mokbel, Walid G. Aref

    Consider an object O that reports to a database server two consecutive locations P0 = (x0,y0) and P1 = (x1,y1) at times t0 and t1, respectively. The database server has no idea about the exact locations of object O between t0 and t1. To be able to answer queries regarding the user location at any time, the database server interpolates the two accurate locations through a trajectory that connects P0 and P1 through a straight line. While object O keeps sending location samples, the database server keeps accumulating set of consecutive trajectory lines that represent the historical movement of object O. Indexing historical spatio-temporal data includes dealing with such large numbers of trajectories. The main idea is to organize past trajectories in a way that supports historical spatial, temporal, and spatio-temporal queries.

    View the Full Publication
  • 01/01/2009Multi-dimensional phenomenon-aware stream query processingAshish Bindra, Ankur Teredesai, Mohamed Ali, Walid G. Aref

    Geographically co-located sensors tend to participate in the same environmental phenomena. Phenomenon-aware stream query processing improves scalability by subscribing each query only to a subset of sensors that participate in the phenomena of interest to that query. In the case of sensors that generate readings with a multi-attribute schema, phenomena may develop across the values of one or more attributes. However tracking and detecting phenomena across all attributes does not scale well as the dimensions increase. As the size of sensor network increases, and as the number of attributes being tracked by a sensor increases this becomes a major bottleneck. In this paper, we present a novel n-dimensional Phenomenon Detection and Tracking mechanism (termed as nd-PDT) over n-ary sensor readings. We reduce the number of dimensions to be tracked by first dropping dimensions without any meaningful phenomena, and then we further reduce the dimensionality by continuously detecting and updating various forms of functional dependencies amongst the phenomenon dimensions.

    View the Full Publication
  • 01/01/2009Supporting annotations on relationsMohamed Eltabakh, Walid G. Aref, Ahmed Elmagarmid, Mourad Ouzzani, Yasin Silva

    Annotations play a key role in understanding and curating databases. Annotations may represent comments, descriptions, lineage information, among several others. Annotation management is a vital mechanism for sharing knowledge and building an interactive and collaborative environment among database users and scientists. What makes it challenging is that annotations can be attached to database entities at various granularities, e.g., at the table, tuple, column, cell levels, or more generally, to any subset of cells that results from a select statement. Therefore, simple comment fields in tuples would not work because of the combinatorial nature of the annotations. In this paper, we present extensions to current database management systems to support annotations. We propose storage schemes to efficiently store annotations at multiple granularities, i.e., at the table, tuple, column, and cell levels. Compared to storing the annotations with the individual cells, the proposed schemes achieve more than an order-of-magnitude reduction in storage and up to 70% saving in the query execution time. We define types of annotations that inherit different behaviors. Through these types, users can specify, for example, whether or not an annotation is continuously applied over newly inserted data and whether or not an annotation is archived when the base data is modified. These annotation types raise several storage and processing challenges that are addressed in the paper. We propose declarative ways to add, archive, query, and propagate annotations. The proposed mechanisms are realized through extensions to the standard SQL. We implemented the proposed functionalities inside PostgreSQL with an easy to use Excel-based front-end graphical interface.

    View the Full Publication
  • 01/01/2009Exploiting similarity-aware grouping in decision support systemsYasin Silva, Muhammed Arshad, Walid G. Aref

    Decision Support Systems (DSS) are information systems that support decision making processes. In many scenarios these systems are built on top of data managed by DBMSs and make extensive use of its underlying grouping and aggregation capabilities, i.e., Group-by operation. Unfortunately, the standard grouping operator has the inherent limitation of being based only on equality, i.e., all the tuples in a group share the same values of the grouping attributes. Similarity-based Group-by (SGB) has been recently proposed as an extension aimed to overcome this limitation. SGB allows fast formation of groups with similar objects under different grouping strategies and the pipelining of results for further processing. This demonstration presents how SGB can be effectively used to build useful DSSs. The presented DSS has been built around the data model and queries of the TPC-H benchmark intending to be representative of complex business analysis applications. The system provides intuitive dashboards that exploit similarity aggregation queries to analyze: (1) customer clustering, (2) profit and revenue, (3) marketing campaigns, and (4) discounts. The presented DSS runs on top of PostgreSQL whose query engine is extended with similarity grouping operators.

    View the Full Publication
  • 01/01/2009Workshop Organizers’ MessageShazia Sadiq, Ke Deng, Xiaofang Zhou, Xiaochun Yang, Walid G. Aref, Alex Delis, Qing Liu, Kai Xu

    Poor data quality is known to compromise the credibility and efficiency of commercial as well as public endeavours. Several developments from industry and academia have contributed significantly towards addressing the problem. These typically include analysts and practitioners who have contributed to the design of strategies and methodologies for data governance; solution architects including software vendors who have contributed towards appropriate system architectures that promote data integration and; and data experts who have contributed to data quality problems such as duplicate detection, identification of outliers, consistency checking and many more through the use of computational techniques. The attainment of true data quality lies at the convergence of the three aspects, namely organizational, architectural and computational. Fulltext Preview

    View the Full Publication
  • 01/01/2009Measuring the structural similarity among XML documents and DTDsElisa Bertino, Giovanna Guerrini, Michela Bertolotto

    In applications involving spatio-temporal modelling, granularities of data may have to adapt according to the evolving semantics and significance of data. In this paper we define ST 2_ODMGe, a multigranular spatio-temporal model supporting evolutions, which encompass the dynamic adaptation of attribute granularities, and the deletion of attribute values. Evolutions are specified as Event - Condition - Action rules and are executed at run-time. The event, the condition, and the action may refer to a period of time and a geographical area. The evolution may also be constrained by the attribute values. The ability of dynamically evolving the object attributes results in a more flexible management of multigranular spatio-temporal data but it requires revisiting the notion of object consistency with respect to class definitions and access to multigranular object values. Both issues are formally investigated in the paper.

    View the Full Publication
  • 01/01/2009StreamShield: a stream-centric approach towards security and privacy in data stream environmentsRimma Nehme, Hyo-Sang Lim, Elisa Bertino, Elke Rundensteiner

    We propose to demonstrate the StreamShield, a system designed to address the problem of security and privacy in the context of Data Stream Management Systems (DSMSs). In StreamShield, continuous access control is enforced by taking a novel "stream-centric" approach towards security. Security policies are not persistently stored on the server, but rather are depicted by security metadata, called "security punctuations", and get embedded into streams together with the data. We distinguish between two types of security punctuations: (1) the "data security punctuations" (dsps) describing the data-side security policies, and (2) the "query security punctuations" (qsps) representing the query-side security policies. The advantages of such stream-centric security model include flexibility, dynamicity and speed of enforcement. Furthermore, DSMSs can adapt to not only data-related but also to security-related selectivities, which helps reduce the waste of resources, when few subjects have access to streaming data.

    View the Full Publication
  • 01/01/2009A Hierarchical Approach to Model Web Query Interfaces for Web Source IntegrationThomas Kabisch, Eduard Dragut, Clement Yu, Ulf Leser

    Much data in the Web is hidden behind Web query interfaces. In most cases the only means to "surface" the content of a Web database is by formulating complex queries on such interfaces. Applications such as Deep Web crawling and Web database integration require an automatic usage of these interfaces. Therefore, an important problem to be addressed is the automatic extraction of query interfaces into an appropriate model. We hypothesize the existence of a set of domain-independent "commonsense design rules" that guides the creation of Web query interfaces. These rules transform query interfaces into schema trees. In this paper we describe a Web query interface extraction algorithm, which combines HTML tokens and the geometric layout of these tokens within a Web page. Tokens are classified into several classes out of which the most significant ones are text tokens and field tokens. A tree structure is derived for text tokens using their geometric layout. Another tree structure is derived for the field tokens. The hierarchical representation of a query interface is obtained by iteratively merging these two trees. Thus, we convert the extraction problem into an integration problem. Our experiments show the promise of our algorithm: it outperforms the previous approaches on extracting query interfaces on about 6.5% in accuracy as evaluated over three corpora with more than 500 Deep Web interfaces from 15 different domains.

    View the Full Publication
  • 01/01/2009A Hybrid Technique for Private Location-Based Queries with Database ProtectionGabriel Ghinita, Panos Kalnis, Kantarcioglu Kantarcioglu, Elisa Bertino

    Mobile devices with global positioning capabilities allow users to retrieve points of interest (POI) in their proximity. To protect user privacy, it is important not to disclose exact user coordinates to un-trusted entities that provide location-based services. Currently, there are two main approaches to protect the location privacy of users: (i) hiding locations inside cloaking regions (CRs) and (ii) encrypting location data using private information retrieval (PIR) protocols. Previous work focused on finding good trade-offs between privacy and performance of user protection techniques, but disregarded the important issue of protecting the POI dataset D. For instance, location cloaking requires large-sized CRs, leading to excessive disclosure of POIs (O(|D|) in the worst case). PIR, on the other hand, reduces this bound to , but at the expense of high processing and communication overhead. We propose a hybrid, two-step approach to private location-based queries, which provides protection for both the users and the database. In the first step, user locations are generalized to coarse-grained CRs which provide strong privacy. Next, a PIR protocol is applied with respect to the obtained query CR. To protect excessive disclosure of POI locations, we devise a cryptographic protocol that privately evaluates whether a point is enclosed inside a rectangular region. We also introduce an algorithm to efficiently support PIR on dynamic POI sub-sets. Our method discloses O(1) POI, orders of magnitude fewer than CR- or PIR-based techniques. Experimental results show that the hybrid approach is scalable in practice, and clearly outperforms the pure-PIR approach in terms of computational and communication overhead.

    View the Full Publication
  • 01/01/2009Access control policy combining: theory meets practiceNinghui Li, Qihua Wang, Wahbeh Qardaji, Elisa Bertino

    Many access control policy languages, e.g., XACML, allow a policy to contain multiple sub-policies, and the result of the policy on a request is determined by combining the results of the sub-policies according to some policy combining algorithms (PCAs). Existing access control policy languages, however, do not provide a formal language for specifying PCAs. As a result, it is difficult to extend them with new PCAs. While several formal policy combining algebras have been proposed, they did not address important practical issues such as policy evaluation errors and obligations; furthermore, they cannot express PCAs that consider all sub-policies as a whole (e.g., weak majority or strong majority). We propose a policy combining language PCL, which can succinctly and precisely express a variety of PCAs. PCL represents an advancement both in terms of theory and practice. It is based on automata theory and linear constraints, and is more expressive than existing approaches. We have implemented PCL and integrated it with SUN's XACML implementation. With PCL, a policy evaluation engine only needs to understand PCL to evaluate any PCA specified in it.

    View the Full Publication
  • 01/01/2009Adaptive Management of Multigranular Spatio-Temporal Object AttributesElena Camossi, Elisa Bertino, Giovanna Guerrini, Michela Bertolotto

    In applications involving spatio-temporal modelling, granularities of data may have to adapt according to the evolving semantics and significance of data. In this paper we define ST 2_ODMGe, a multigranular spatio-temporal model supporting evolutions , which encompass the dynamic adaptation of attribute granularities, and the deletion of attribute values. Evolutions are specified as Event - Condition - Action rules and are executed at run-time. The event, the condition, and the action may refer to a period of time and a geographical area. The evolution may also be constrained by the attribute values. The ability of dynamically evolving the object attributes results in a more flexible management of multigranular spatio-temporal data but it requires revisiting the notion of object consistency with respect to class definitions and access to multigranular object values. Both issues are formally investigated in the paper.

    View the Full Publication
  • 01/01/2009An Access Control Language for a General Provenance ModelQun Ni, Shouhuai Xu, Elisa Bertino, Ravi Sandhu

    Provenance access control has been recognized as one of the most important components in an enterprise-level provenance system. However, it has only received little attention in the context of data security research. One important challenge in provenance access control is the lack of an access control language that supports its specific requirements, e.g., the support of both fine-grained policies and personal preferences, and decision aggregation from different applicable policies. In this paper, we propose an access control language tailored to these requirements.

    View the Full Publication
  • 01/01/2009An algebra for fine-grained integration of XACML policiesPrathima Rao, Dan Lin, Elisa Bertino, Ninghui Li

    Collaborative and distributed applications, such as dynamic coalitions and virtualized grid computing, often require integrating access control policies of collaborating parties. Such an integration must be able to support complex authorization specifications and the fine-grained integration requirements that the various parties may have. In this paper, we introduce an algebra for fine-grained integration of sophisticated policies. The algebra, which consists of three binary and two unary operations, is able to support the specification of a large variety of integration constraints. To assess the expressive power of our algebra, we introduce a notion of completeness and prove that our algebra is complete with respect to this notion. We then propose a framework that uses the algebra for the fine-grained integration of policies expressed in XACML. We also present a methodology for generating the actual integrated XACML policy, based on the notion of Multi-Terminal Binary Decision Diagrams.

    View the Full Publication
  • 01/01/2009Assessing the trustworthiness of location data based on provenanceChenyun Dai, Hyo-Sang Lim, Elisa Bertino, Yang-Sae Moon

    Trustworthiness of location information about particular individuals is of particular interest in the areas of forensic science and epidemic control. In many cases, location information is not precise and may include fraudulent information. With the growth of mobile computing and positioning systems, e.g., GPS and cell phones, it has become possible to trace the location of moving objects. Such Systems provide us an opportunity to find out the true locations of individuals. In this paper, we present a model to compute trustworthiness of the location information of an individual based on different evidences from different sources. We also introduce a collusion attack that may bias the computation. Based on the analysis of the attack, we present the algorithm to detect and reduce the effect of collusion attacks. Our experimental results show the efficiency and effectiveness of our approach.

    View the Full Publication
  • 01/01/2009Assured information sharing: concepts and issuesElisa Bertino

    The need to share information across organization is an imperative for many organizations, in both the private and public sectors. Sharing must not however undermine privacy and confidentiality of information. Accountability about the use of information, integrity, and support for compliance with respect to organizational policies are also crucial requirements. In this paper we introduce the notion of assured information sharing lifecycle as a framework for reasoning about approaches and techniques for securely sharing of information. We then focus on policies relevant in the context of secure information sharing and discuss tools for policy management.

    View the Full Publication
  • 01/01/2009Collaborative Computing: Networking, Applications and Worksharing-A structure preserving approach for securing XML documentsElisa Bertino, Mohamed Nabeel

    With the widespread adoption of XML as the message format to disseminate content over distributed systems including Web Services and Publish-Subscribe systems, different methods have been proposed for securing messages. We focus on a subset of such systems where incremental updates are disseminated. The goal of this paper is to develop an approach for disseminating only the updated or accessible portions of XML content while assuring confidentiality and integrity at message level. While sending only the updates greatly reduces the bandwidth requirements, it introduces the challenge of assuring security efficiently for partial messages disseminated to intermediaries and clients. We propose a novel localized encoding scheme based on conventional cryptographic functions to enforce security for confidentiality and content integrity at the granularity of XML node level. We also address structural integrity with respect to the complete XML document to which clients have access. Our solution takes every possible measure to minimize indirect information leakage by making the rest of the structure of XML documents to which intermediaries and clients do not have access oblivious. The experimental results show that our scheme is superior to conventional techniques of securing XML documents when the percentage of update with respect to original documents is low.

    View the Full Publication
  • 01/01/2009Correctness Criteria Beyond SerializabilityMourad Ouzzani, Brahim Medjahed, Ahmed Elmagarmid

    A transaction is a logical unit of work that includes one or more database access operations such as insertion, deletion, modification, and retrieval [8]. A schedule (or history) S of n transactions T1,...,Tn is an ordering of the transactions that satisfies the following two conditions: (i) the operations of Ti (i = 1,...,n) in S must occur in the same order in which they appear in Ti, and (ii) operations from Tj (j 6¼ i) may be interleaved with Ti’s operations in S. A schedule S is serial if for every two transactions Ti and Tj that appear in S, either all operations of Ti appear before all operations of Tj, or vice versa. Otherwise, the schedule is called nonserial or concurrent. Non-serial schedules of transactions may lead to concurrency problems such as lost update, dirty read, and unrepeatable read. For instance, the lost update problem occurs whenever two transactions, while attempting to modify a data item, both read the item’s old value before either of them writes the item’s new value [2]. The simplest way for controlling concurrency is to allow only serial schedules. However, with no concurrency, database systems may make poor use of their resources and hence, be inefficient, resulting in smaller transaction execution rate for example. To broaden the class of allowable transaction schedules, serializability has been proposed as the major correctness criterion for concurrency control [7,11]. Serializability ensures that a concurrent schedule of transactions is equivalent to some serial schedule of the same transactions [12]. While serializability has been successfully used in traditional database applications, e.g., airline reservations and banking, it has been proven to be restrictive and hardly applicable in advanced applications such as Computer- Aided Design (CAD), Computer-Aided Manufacturing (CAM), office automation, and multidatabases. These applications introduced new requirements that either prevent the use of serializability (e.g., violation of local autonomy in multidatabases) or make the use of serializability inefficient (e.g., long-running transactions in CAD/CAM applications). These limitations have motivated the introduction of more flexible correctness criteria that go beyond the traditional serializability.

    View the Full Publication
  • 01/01/2009D-algebra for composing access control policy decisionsQun Ni, Elisa Bertino, Jorge Lobo

    This paper proposes a D-algebra to compose decisions from multiple access control policies. Compared to other algebra-based approaches aimed at policy composition, D-algebra is the only one that satisfies both functional completeness (any possible decision matrix can be expressed by a D-algebra formula) and computational effectiveness (a formula can be computed efficiently given any decision matrix). The D-algebra has several relevant applications in the context of access control policies, namely the analysis of policy languages decision mechanisms, and the development of tools for policy authoring and enforcement.

    View the Full Publication
  • 01/01/2009Generalization of ACID PropertiesBrahim Medjahed, Mourad Ouzzani, Ahmed Elmagarmid

    ACID (Atomicity, Consistency, Isolation, and Durability) is a set of properties that guarantee the reliability of database transactions [2]. ACID properties were initially developed with traditional, business-oriented applications (e.g., banking) in mind. Hence, they do not fully support the functional and performance requirements of advanced database applications such as computer-aided design, computer-aided manufacturing, office automation, network management, multidatabases, and mobile databases. For instance, transactions in computer-aided design applications are generally of long duration and preserving the traditional ACID properties in such transactions would require locking resources for long periods of time. This has lead to the generalization of ACID properties as Recovery, Consistency, Visibility and Permanence. The aim of such generalization is to relax some of the constraints and restrictions imposed by the ACID properties. For example, visibility relaxes the isolation property by enabling the sharing of partial results and hence promoting cooperation among concurrent transactions. Hence, the more generalized are ACID properties, the more flexible is the corresponding transaction model.

    View the Full Publication
  • 01/01/2009Identity Attribute-Based Role Provisioning for Human WS-BPEL ProcessesFederica Paci, Rodolfo Ferrini, Elisa Bertino

    The WS-BPEL specification focuses on business processes the activities of which are assumed to be interactions with Web services. However, WS-BPEL processes go beyond the orchestration of activities exposed as Web services. There are cases in which people must be considered as additional participants to the execution of a process. The inclusion of humans, in turn, requires solutionsto support the specification and enforcement of authorizations to users for the execution of human activities while enforcing authorization constraints.In this paper, we extend RBAC-WS-BPEL, a role-based authorization framework for WS-BPEL processes with an identity attribute-based role provisioning approach that preserves the privacy of the users who claim the execution of human activities. Such approach is based on the notion of identity records and role provisioning policies, and uses Pedersen commitments, aggregated zero knowledge proof of knowledge, andOblivious Commitment-Based Envelope protocols to achieve privacy of user identity information.

    View the Full Publication
  • 01/01/2009L'Elimination de la subjectivité dans la recommandation de confianceOmar Hasan, Lionel Brunie, Jean-Marc Pierson, Elisa Bertino

    In ubiquitous environments, a party who wishes to make a transaction often requires that it has a certain level of trust in the other party. It is frequently the case that the parties are unknown to each other and thus share no preexisting trust. Trust-based systems enable users to establish trust in unknown users through trust recommendation from known users. For example, Bob may choose to trust an unknown user Carol when he receives a recommendation from his friend Alice that Carol's trustworthiness is 0.8 on the interval [0, 1]. In this paper we highlight the problem that when a trust value is recommended by one user to another it may lose its real meaning due to subjectivity. Bob may regard 0.8 as a very high value of trust but it is possible that Alice perceived this same value as only average. We present a solution for the elimination of subjectivity from trust recommendation. We run experiments to compare our subjectivity-eliminated trust recommendation method with the unmodified method. In a random graph based web of trust with high subjectivity, it is observed that the novel method can give better results up to 95% of the time.

    View the Full Publication
  • 01/01/2009Multi-granular Spatio-temporal Object Models: Concepts and Research DirectionsElisa Bertino, Elena Camossi, Michela Bertolotto

    The capability of representing spatio-temporal objects is fundamental when analysing and monitoring the changes in the spatial configuration of a geographical area over a period of time. An important requirement when managing spatio-temporal objects is the support for multiple granularities. In this paper we discuss how the modelling constructs of object data models can be extended for representing and queryingmulti-granular spatio-temporal objects. In particular, we describe object-oriented formalizations for granularities, granules, and multi-granular values, exploring the issues of value conversions. Furthermore, we formally define an object-oriented multi-granular query language, and discuss dynamic multi-granularity. Finally, we discuss open research issues.

    View the Full Publication
  • 01/01/2009Outsourcing Search Services on Private Spatial DataMan Lung Yiu, Gabriel Ghinita, Christian Jensen, Panos Kalnis

    Cloud computing services enable organizations and individuals to outsource the management of their data to a service provider in order to save on hardware investments and reduce maintenance costs. Only authorized users are allowed to access the data. Nobody else, including the service provider, should be able to view the data. For instance, a real-estate company that owns a large database of properties wants to allow its paying customers to query for houses according to location. On the other hand, the untrusted service provider should not be able to learn the property locations and, e.g., selling the information to a competitor. To tackle the problem, we propose to transform the location datasets before uploading them to the service provider. The paper develops a spatial transformation that re-distributes the locations in space, and it also proposes a cryptographic-based transformation. The data owner selects the transformation key and shares it with authorized users. Without the key, it is infeasible to reconstruct the original data points from the transformed points. The proposed transformations present distinct trade-offs between query efficiency and data confidentiality. In addition, we describe attack models for studying the security properties of the transformations. Empirical studies demonstrate that the proposed methods are efficient and applicable in practice.

    View the Full Publication
  • 01/01/2009Physically restricted authentication with trusted hardwareMichael Kirkpatrick, Elisa Bertino

    Modern computer systems permit users to access protected information from remote locations. In certain secure environments, it would be desirable to restrict this access to a particular computer or set of computers. Existing solutions of machine-level authentication are undesirable for two reasons. First, they do not allow fine-grained application layer access decisions. Second, they are vulnerable to insider attacks in which a trusted administrator acts maliciously. In this work, we describe a novel approach using secure hardware that solves these problems. In our design, multiple administrators are required for installation of a system. After installation, the authentication privileges are physically linked to that machine, and no administrator can bypass these controls. We define an administrative model and detail the requirements for an authentication protocol to be compatible with our methodology. Our design presents some challenges for large-scale systems, in addition to the benefit of reduced maintenance.

    View the Full Publication
  • 01/01/2009Preventing velocity-based linkage attacks in location-aware applicationsGabriel Ghinita, Maria Luisa Damiani, Claudio Silvestri, Elisa Bertino

    Mobile devices with positioning capabilities allow users to participate in novel and exciting location-based applications. For instance, users may track the whereabouts of their acquaintances in location-aware social networking applications, e.g., GoogleLatitude. Furthermore, users can request information about landmarks in their proximity. Such scenarios require users to report their coordinates to other parties, which may not be fully trusted. Reporting precise locations may result in serious privacy violations, such as disclosure of lifestyle details, sexual orientation, etc. A typical approach to preserve location privacy is to generate a cloaking region (CR) that encloses the user position. However, if locations are continuously reported, an attacker can correlate CRs from multiple timestamps to accurately pinpoint the user position within a CR. In this work, we protect against linkage attacks that infer exact locations based on prior knowledge about maximum user velocity. Assume user u who reports two consecutive cloaked regions A and B. We consider two distinct protection scenarios: in the first case, the attacker does not have information about the sensitive locations on the map, and the objective is to ensure that u can reach some point in B from any point in A. In the second case, the attacker knows the placement of sensitive locations, and the objective is to ensure that u can reach any point in B from any point in A. We propose spatial and temporal cloaking transformations to preserve user privacy, and we show experimentally that privacy can be achieved without significant quality of service deterioration.

    View the Full Publication
  • 01/01/2009Privacy Preserving OLAP over Distributed XML DocumentsElisa Bertino, Alfredo Cuzzocrea

    We introduce a novel Privacy Preserving Distributed Data Mining routine over collections of XML documents stored in distributed environments, called secure distributed OLAP aggregation, which plays a critical role in next-generation distributed Business Intelligence (BI) scenarios. In order to effectively and efficiently support secure distributed OLAP aggregation routines in such scenarios, a privacy preserving distributed OLAP framework that embeds several points of innovation in the context of privacy preserving OLAP research is hence proposed and deeply investigated in this paper.

    View the Full Publication
  • 01/01/2009Privacy-preserving Digital Identity Management for Cloud ComputingElisa Bertino, Federica Paci, Rodolfo Ferrini, Ning Shang

    Digital identity management services are crucial in cloud computing infrastructures to authenticate users and to support flexible access control to services, based on user identity properties (also called attributes) and past interaction histories. Such services should preserve the privacy of users, while at the same time enhancing interoperability across multiple domains and simplifying management of identity verification. In this paper we propose an approach addressing such requirements, based on the use of high-level identity verification policies expressed in terms of identity attributes, zero-knolwedge proof protocols, and semantic matching techniques. The paper describes the basic techniques we adopt and the architeture of a system developed based on these techniques, and reports performance experimental results.

    View the Full Publication
  • 01/01/2009Privacy-preserving management of transactions' receipts for mobile environmentsFederica Paci, Ning Shang, Sam Kerr, Kevin Steuer Jr, Jungha Woo, Elisa Bertino

    Users increasingly use their mobile devices for electronic transactions to store related information, such as digital receipts. However, such information can be target of several attacks. There are some security issues related to M-commerce: the loss or theft of mobile devices results in a exposure of transaction information; transaction receipts that are send over WI-FI or 3G networks can be easily intercepted; transaction receipts can also be captured via Bluetooth connections without the user's consent; and mobile viruses, worms and Trojan horses can access the transaction information stored on mobile devices if this information is not protected by passwords or PIN numbers. Therefore, assuring privacy and security of transactions' information, as well as of any sensitive information stored on mobile devices is crucial. In this paper, we propose a privacy-preserving approach to manage electronic transaction receipts on mobile devices. The approach is based on the notion of transaction receipts issued by service providers upon a successful transaction and combines Pedersen commitment and Zero Knowledge Proof of Knowledge (ZKPK) techniques and Oblivious Commitment-Based Envelope (OCBE) protocols. We have developed a version of such protocol for Near Field Communication (NFC) enabled cellular phones

    View the Full Publication
  • 01/01/2009Private Queries and Trajectory Anonymization: a Dual Perspective on Location PrivacyGabriel Ghinita

    The emergence of mobile devices with Internet connectivity (e.g., Wi-Fi) and global positioning capabilities (e.g., GPS) have triggered the widespread development of location-based applications. For instance, users are able to ask queries about points of interest in their proximity. Furthermore, users can act as mobile sensors to monitor traffic flow, or levels of air pollution. However, such applications require users to disclose their locations, which raises serious privacy concerns. With knowledge of user locations, a malicious attacker can infer sensitive information, such as alternative lifestyles or political affiliations. Preserving location privacy is an essential requirement towards the successful deployment of location-based services (LBS). Currently, two main LBS use scenarios exist: in the first one, users send location-based queries to an un-trusted server, and the privacy objective is to protect the location of the querying user. In the second setting, a trusted entity, such as a telephone company, gathers large amounts of location data (i.e., trajectory traces) and wishes to publish them for data mining (e.g., alleviating traffic congestion). In this case, it is crucial to prevent an adversary from associating trajectories to user identities. In this survey paper, we give an overview of the state-of-the-art in location privacy protection from the dual perspective of query privacy and trajectory anonymization. We review the most prominent design choices and technical solutions, and highlight their relative strengths and weaknesses.

    View the Full Publication
  • 01/01/2009Query Processing Techniques for Compliance with Data Confidence PoliciesChenyun Dai, Dan Lin, Murat Kantarcioglu, Elisa Bertino

    Data integrity and quality is a very critical issue in many data-intensive decision-making applications. In such applications, decision makers need to be provided with high quality data on which they can rely on with high confidence. A key issue is that obtaining high quality data may be very expensive. We thus need flexible solutions to the problem of data integrity and quality. This paper proposes one such solution based on four key elements. The first element is the association of a confidence value with each data item in the database. The second element is the computation of the confidence values of query results by using lineage propagation. The third element is the notion of confidence policies. Such a policy restricts access to the query results by specifying the minimum confidence level that is required for use in a certain task by a certain subject. The fourth element is an approach to dynamically increment the data confidence level to return query results that satisfy the stated confidence policies. In particular, we propose several algorithms for incrementing the data confidence level while minimizing the additional cost. Our experimental results have demonstrated the efficiency and effectiveness of our approach.

    View the Full Publication
  • 01/01/2009Scalable and Effective Test Generation for Role-Based Access Control SystemsAmmar Masood, Rafae Bhatti, Arif Ghafoor, Aditya P. Mathur

    Conformance testing procedures for generating tests from the finite state model representation of Role-Based Access Control (RBAC) policies are proposed and evaluated. A test suite generated using one of these procedures has excellent fault detection ability but is astronomically large. Two approaches to reduce the size of the generated test suite were investigated. One is based on a set of six heuristics and the other directly generates a test suite from the finite state model using random selection of paths in the policy model. Empirical studies revealed that the second approach to test suite generation, combined with one or more heuristics, is most effective in the detection of both first-order mutation and malicious faults and generates a significantly smaller test suite than the one generated directly from the finite state models.

    View the Full Publication
  • 01/01/2009Security Analysis of the SASI ProtocolTianjie Cao, Elisa Bertino, Hong Lei

    The ultralightweight RFID protocols only involve simple bit-wise operations (like XOR, AND, OR, etc.) on tags. In this paper, we show that the ultralightweight strong authentication and strong integrity (SASI) protocol has two security vulnerabilities, namely denial-of-service (DoS) and anonymity tracing based on a compromised tag. The former permanently disables the authentication capability of a RFID tag by destroying synchronization between the tag and the RFID reader. The latter links a compromised tag with past actions performed on this tag.

    View the Full Publication
  • 01/01/2009Self-tuning query mesh for adaptive multi-route query processingRimma Nehme, Elke Rundensteiner, Elisa Bertino

    In real-life applications, different subsets of data may have distinct statistical properties, e.g., various websites may have diverse visitation rates, different categories of stocks may have dissimilar price fluctuation patterns. For such applications, it can be fruitful to eliminate the commonly made single execution plan assumption and instead execute a query using several plans, each optimally serving a subset of data with particular statistical properties. Furthermore, in dynamic environments, data properties may change continuously, thus calling for adaptivity. The intriguing question is: can we have an execution strategy that (1) is plan-based to leverage on all the benefits of traditional plan-based systems, (2) supports multiple plans each customized for different subset of data, and yet (3) is as adaptive as "plan-less" systems like Eddies? While the recently proposed Query Mesh (QM) approach provides a foundation for such an execution paradigm, it does not address the question of adaptivity required for highly dynamic environments. In this work, we fill this gap by proposing a Self-Tuning Query Mesh (ST-QM) --- an adaptive solution for content-based multi-plan execution engines. ST-QM addresses adaptive query processing by abstracting it as a concept drift problem --- a well-known subject in machine learning. Such abstraction allows to discard adaptivity candidates (i.e., the cases indicating a change in the environment) early in the process if they are insignificant or not "worthwhile" to adapt to, and thus minimize the adaptivity overhead. A unique feature of our aproach is that all logical transformations to the execution strategy get translated into a single inexpensive physical operation --- the classifier change. Our experimental evaluation using a continuous query engine shows the performance benefits of ST-QM approach over the alternatives, namely the non-adaptive and the Eddies-based solutions.

    View the Full Publication
  • 01/01/2009Spatial AnonymityPanos Kalnis, Gabriel Ghinita

    Let U be a user who is asking via a mobile device (e.g., phone, PDA) a query relevant to his current location, such as ‘‘find the nearest betting office.’’ This query can be answered by a Location Based Service (LBS) in a public web server (e.g., Google Maps, apQuest),which is not trustworthy. Since the query may be sensitive, U uses encryption and a pseudonym, in order to protect his privacy. However, the query still contains the exact location, which may reveal the identity of U. For example, if Uasks the query within his residence, an attacker may use public information (e.g., white pages) to associate the location with U. Spatial k-Anonymity (SKA) solves this problem by ensuring that an attacker cannot identify U as the querying user with probability larger than 1 ∕ k, where k is a user-defined anonymity requirement. To achieve this, a centralized or distributed anonymization service replaces the exact location of U with an area (called Anonymizing Spatial Region or ASR). The ASR encloses U and at least k - 1 additional users. The LBS receives the ASR and retrieves the query results for any point inside the ASR. Those results are forwarded to the anonymization service, which removes the false hits and returns the actual answer to U.

    View the Full Publication
  • 01/01/2009Stop Word and Related Problems in Web Interface IntegrationEduard Dragut, Fang Fang, Prasad Sistla, Clement Yu

    The goal of recent research projects on integrating Web databases has been to enable uniform access to the large amount of data behind query interfaces. Among the tasks addressed are: source discovery, query interface extraction, schema matching, etc. There are also a number of tasks that are commonly ignored or assumed to be apriori solved either manually or by some oracle. These tasks include (1) finding the set of stop words and (2) handling occurrences of "semantic enrichment words" within labels. These two subproblems have a direct impact on determining the synonymy and hyponymy relationships between labels. In (1), a word like "from" is a stop word in general but it is a content word in domains such as Airline and Real Estate. We formulate the stop word problem, prove its complexity and provide an approximation algorithm. In (2), we study the impact of words like AND and OR on establishing semantic relationships between labels (e.g. "departure date and time" is a hypernym of "departure date"). In addition, we develop a theoretical framework to differentiate synonymy relationship from hyponymy relationship among labels involving multiple words. We scrutinize its strength and limitations both analytically and experimentally. We use real data from the Web in our experiments. We analyze over 2300 labels of 220 user interfaces in 9 distinct domains.

    View the Full Publication
  • 01/01/2009Supporting RBAC with XACML+OWLRodolfo Ferrini, Elisa Bertino

    XACML does not natively support RBAC and even the pecialized XACML profiles are not able to support many relevant constraints such as static and dynamic separation of duty. Extending XACML to support such constraints, however, is an issue that requires extensions not only to the XACML language but also to the XACML reference architecture and engine. In this paper we introduce XACML+OWL, a framework that integrates OWL ontologies and XACML policies for supporting RBAC. The basic idea is to decouple the design of an RBAC system by modeling the role hierarchy and the constraints with an OWL ontology and the authorization policies with XACML. In doing this, we introduce new functions that extend policies with semantic reasoning services based on the OWL ontology. As part of such extension, we extend the reference architecture of XACML and the XACML data-flow for access control decisions with the invocation of such functions.

    View the Full Publication
  • 01/01/2009The Challenge of Assuring Data TrustworthinessElisa Bertino, Chenyun Dai, Murat Kantarcioglu

    With the increased need of data sharing among multiple organizations, such as government organizations, financial corporations, medical hospitals and academic institutions, it is critical to ensure that data is trustworthy so that effective decisions can be made based on these data. In this paper, we first discuss motivations and requirement for data trustworthiness. We then present an architectural framework for a comprehensive system for trustworthiness assurance. We then discuss an important issue in our framework, that is, the evaluation of data provenance and survey a trust model for estimating the confidence level of the data and the trust level of data providers. By taking into account confidence about data provenance, we introduce an approach for policy observing query evaluation. We highlight open research issues and research directions throughout the paper.

    View the Full Publication
  • 01/01/2009The Design and Evaluation of Accountable Grid Computing SystemWonjun Lee, Anna Squicciarini, Elisa Bertino

    Accountability is an important aspect of any computer system. It assures that every action executed in the system can be traced back to some entity. Accountability is even more crucial for assuring the safety and security in grid systems, given the very large number of users active in these sophisticated environments. However, to date no comprehensive approach to accountability for grid systems exists. Our work addresses such inadequacy by developing a comprehensive accountability system driven by policies and supported by accountability agents. In this paper we first discuss the requirements that have driven the design of our accountability system and then present some interesting aspects related to our accountability framework. We describe a fully working implementation of our accountability system, and conduct extensive experimental evaluations. Our experiments, carried out using the Emulab testbed, demonstrate that the implemented system is efficient and it scales for grid systems of large number of resources and users.

    View the Full Publication
  • 01/01/2009The plant ionome coming into focusLorraine Williams, David Salt

    92 elements have been identified on earth and 17 of these are known to be essential to all plants. The essential elements required in relatively large amounts (>0.1% of dry mass) are called macronutrients and include C, H, O, N, S, P, Ca, K, Mg. Those required in much smaller amounts (<0.01% of dry mass) are referred to as micronutrients or trace elements and include Ni, Mo, Cu, Zn, Mn, B, Fe, and Cl. Plant growth and development depends on a balanced supply of these essential elements and thus the plant has a range of homeostatic mechanisms operating to ensure that this is maintained. Beneficial elements which promote growth and may be essential to some taxa, include Na, Co, Al, Se and Si. Elements such as the heavy metal Cd and the metalloid As have no demonstrated biological function in plants, but are nevertheless taken up and cause severe toxicity in most plant species. The concept for this special issue is the plant ionome, a word coined to encompass all these elements and allow focussed discussion and investigations on the mechanisms that co-ordinately regulate these elements in response to genetic and environmental factors reviewed in Salt et al., 2008).

    View the Full Publication
  • 01/01/2009The SCIFC Model for Information Flow Control in Web Service CompositionWei She, I-Ling Yen, Bhavani Thuraisingham, Elisa Bertino

    Existing web service access control models focus on individual web services, and do not consider service composition. In composite services, a major issue is information flow control. Critical information may flow from one service to another in a service chain through requests and responses and there is no mechanism for verifying that the flow complies with the access control policies. In this paper, we propose an innovative access control model to empower the services in a service chain to control the flow of their sensitive information. Our model supports information flow control through a back-check procedure and pass-on certificates. We also introduce additional factors such as the carry-along policy, security class, and transformation factor, to improve the protocol efficiency. A formal analysis is also presented to show the power and complexity of our protocol.

    View the Full Publication
  • 01/01/2009Using Anonymized Data for ClassificationAli Inan, Murat Kantarcioglu, Elisa Bertino

    In recent years, anonymization methods have emerged as an important tool to preserve individual privacy when releasing privacy sensitive data sets. This interest in anonymization techniques has resulted in a plethora of methods for anonymizing data under different privacy and utility assumptions. At the same time, there has been little research addressing how to effectively use the anonymized data for data mining in general and for distributed data mining in particular. In this paper, we propose a new approach for building classifiers using anonymized data by modeling anonymized data as uncertain data. In our method, we do not assume any probability distribution over the data. Instead, we propose collecting all necessary statistics during anonymization and releasing these together with the anonymized data. We show that releasing such statistics does not violate anonymity. Experiments spanning various alternatives both in local and distributed data mining settings reveal that our method performs better than heuristic approaches for handling anonymized data.

    View the Full Publication
  • 01/01/2009Privacy-Preserving Accountable Accuracy Management Systems (PAAMS)Roshan Thomas, Ravi Sandhu, Elisa Bertino, Budak Arpinar, Shouhuai Xu

    We argue for the design of “Privacy-preserving Accountable Accuracy Management Systems (PAAMS)”. The designs of such systems recognize from the onset that accuracy, accountability, and privacy management are intertwined. As such, these systems have to dynamically manage the tradeoffs between these (often conflicting) objectives. For example, accuracy in such systems can be improved by providing better accountability links between structured and unstructured information. Further, accuracy may be enhanced if access to private information is allowed in controllable and accountable ways. Our proposed approach involves three key elements. First, a model to link unstructured information such as that found in email, image and document repositories with structured information such as that in traditional databases. Second, a model for accuracy management and entity disambiguation by proactively preventing, detecting and tracing errors in information bases. Third, a model to provide privacy-governed operation as accountability and accuracy are managed.

    View the Full Publication
  • 01/01/2009Beyond k-Anonymity: A Decision Theoretic Framework for Assessing Privacy RiskGuy Lebanon, Monica Scannapieco, Mohamed Fouad, Elisa Bertino

    An important issue any organization or individual has to face when managing data containing sensitive information, is the risk that can be incurred when releasing such data. Even though data may be sanitized before being released, it is still possible for an adversary to reconstruct the original data using additional information thus resulting in privacy violations. To date, however, a systematic approach to quantify such risks is not available. In this paper we develop a framework, based on statistical decision theory, that assesses the relationship between the disclosed data and the resulting privacy risk. We model the problem of deciding which data to disclose, in terms of deciding which disclosure rule to apply to a database. We assess the privacy risk by taking into account both the entity identification and the sensitivity of the disclosed information. Furthermore, we prove that, under some conditions, the estimated privacy risk is an upper bound on the true privacy risk. Finally, we relate our framework with the k-anonymity disclosure method. The proposed framework makes the assumptions behind k-anonymity explicit, quantifies them, and extends them in several natural directions.

    View the Full Publication
  • 01/01/2009Location Privacy in Moving-Object EnvironmentsDan Lin, Elisa Bertino, Reynold Cheng, Sunil Prabhakar

    The expanding use of location-based services has profound implications on the privacy of personal information. If no adequate protection is adopted, information about movements of specific individuals could be disclosed to unauthorized subjects or organizations, thus resulting in privacy breaches. In this paper, we propose a framework for preserving location privacy in moving-object environments. Our approach is based on the idea of sending to the service provider suitably modified location information. Such modifications, that include transformations like scaling, are performed by agents interposed between users and service providers. Agents execute data transformation and the service provider directly processes the transformed dataset. Our technique not only prevents the service provider from knowing the exact locations of users, but also protects information about user movements and locations from being disclosed to other users who are not authorized to access this information. A key characteristic of our approach is that it achieves privacy without degrading service quality. We also define a privacy model to analyze our framework, and examine our approach experimentally.

    View the Full Publication
  • 01/01/2009Foreword for the special issue of selected papers from the 1st ACM SIGSPATIAL Workshop on Security and Privacy in GIS and LBSElisa Bertino, Maria Luisa Damiani

    The first Workshop on Security and Privacy in GIS and LBS (SPRINGL 2008) was organized on November 4, 2008 at Irvine (CA) in conjunction with the SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008). The goal of the SPRINGL workshop series is to provide a forum for researchers working in the area of geospatial data security and privacy. Both security and privacy are critical for geospatial applications because of the dramatic increase and dissemination of geospatial data in several application contexts including homeland security, environmental crises, and natural and industrial disasters. Furthermore, geospatial infrastructures are being leveraged by companies to provide a large variety of location-based services (LBS) able to tailor services to users. However, despite the increase of publicly accessible geospatial information only little attention is being paid to how to secure geospatial information systems (GIS) and LBS. Privacy is also of increasing concern given the sensitivity of personally-identifiable location information. This is despite major advancements that have been made in secure computing infrastructures and the secure and privacy-preserving management of traditional (relational) data in particular. The discussion at the workshop spanned across security and privacy aspects, as they relate to the management of geospatial data and to the development of emerging LBS. The present special issue of Transactions on Data Privacy contains four extended papers, focusing on privacy, that have been selected from the papers presented at SPRINGL 2008.

    View the Full Publication
  • 01/01/2009Efficient integration of fine-grained access control and resource brokering in gridPietro Mazzoleni, Bruno Crispo, Swaminathan Sivasubramanian, Elisa Bertino

    In this paper, we present a novel resource brokering service for grid systems which considers authorization policies of the grid nodes in the process of selecting the resources to be assigned to a request. We argue such an integration is needed to avoid scheduling requests onto resources the policies of which do not authorize their execution. Our service, implemented in Globus as a part of Monitoring and Discovery Service (MDS), is based on the concept of fine-grained access control (FGAC) which enables participating grid nodes to specify fine-grained policies concerning the conditions under which grid clients can access their resources. Since the process of evaluating authorization policies, in addition to checking the resource requirements, can be a potential bottleneck for a large scale grid, we also analyze the problem of the efficient evaluation of FGAC policies. In this context, we present GroupByRule, a novel method for policy organization and compare its performance with other strategies.

    View the Full Publication
  • 01/01/2009Query Mesh: MultiRouteRimma Nehme, Karen Works, Elke Rundensteiner, Elisa Bertino

    We propose to demonstrate a practical alternative approach to the current state-of-the-art query processing techniques, called the “Query Mesh” (or QM, for short). The main idea of QM is to compute multiple routes (i.e., query plans)1, each designed for a particular subset of data with distinct statistical properties. Based on the execution routes and the data characteristics, a classifier model is induced and is used to partition new data tuples to assign the best routes for their processing. We propose to demonstrate the QM framework in the streaming context using our demo application, called the “Ubi-City”. We will illustrate the innovative features of QM, including: the QM optimization with the integrated machine learning component, the QM execution using the efficient “Self-Routing Fabric” infrastructure, and finally, the QM adaptive component that performs the online adaptation of QM with near-zero runtime overhead.

    View the Full Publication
  • 01/01/2009A distributed approach to enabling privacy-preserving model-based classifier trainingHangzai Luo, Xiaodong Lin, Aoying Zhou, Elisa Bertino

    This paper proposes a novel approach for privacy-preserving distributed model-based classifier training. Our approach is an important step towards supporting customizable privacy modeling and protection. It consists of three major steps. First, each data site independently learns a weak concept model (i.e., local classifier) for a given data pattern or concept by using its own training samples. An adaptive EM algorithm is proposed to select the model structure and estimate the model parameters simultaneously. The second step deals with combined classifier training by integrating the weak concept models that are shared from multiple data sites. To reduce the data transmission costs and the potential privacy breaches, only the weak concept models are sent to the central site and synthetic samples are directly generated from these shared weak concept models at the central site. Both the shared weak concept models and the synthetic samples are then incorporated to learn a reliable and complete global concept model. A computational approach is developed to automatically achieve a good trade off between the privacy disclosure risk, the sharing benefit and the data utility. The third step deals with validating the combined classifier by distributing the global concept model to all these data sites in the collaboration network while at the same time limiting the potential privacy breaches. Our approach has been validated through extensive experiments carried out on four UCI machine learning data sets and two image data sets.

    View the Full Publication
  • 01/01/2009Privacy-preserving incremental data disseminationJi-Won Byun, Tiancheng Li, Elisa Bertino, Ninghui Li, Yonglak Sohn

    Although the k-anonymity and ℓ-diversity models have led to a number of valuable privacy-protecting techniques and algorithms, the existing solutions are currently limited to static data release. That is, it is assumed that a complete dataset is available at the time of data release. This assumption implies a significant shortcoming, as in many applications data collection is rather a continual process. Moreover, the assumption entails “one-time” data dissemination; thus, it does not adequately address today's strong demand for immediate and up-to-date information. In this paper, we consider incremental data dissemination, where a dataset is continuously incremented with new data. The key issue here is that the same data may be anonymized and published multiple times, each of the time in a different form. Thus, static anonymization (i.e., anonymization which does not consider previously released data) may enable various types of inference. In this paper, we identify such inference issues and discuss some prevention methods.

    View the Full Publication
  • 01/01/2009Security for Web Services: Standards and Research IssuesLorenzo D. Martino, Elisa Bertino

    This article discusses the main security requirements for Web services and it describes how such security requirements are addressed by standards for Web services security recently developed or under development by various standardizations bodies. Standards are reviewed according to a conceptual framework that groups them by the main functionalities they provide. Covered standards include most of the standards encompassed by the original Web Service Security roadmap proposed by Microsoft and IBM in 2002 (Microsoft and IBM 2002). They range from the ones geared toward message and conversation security and reliability to those developed for providing interoperable Single Sign On and Identity Management functions in federated organizations. The latter include Security Assertion Markup Language (SAML), WS-Policy, XACML, that is related to access control and has been recently extended with a profile for Web services access control; XKMS and WS-Trust; WS-Federation, Liberty Alliance and Shibboleth, that address the important problem of identity management in federated organizations. The article also discusses the issues related to the use of the standards and open research issues in the area of access control for Web services and innovative digital identity management techniques are outlined.

    View the Full Publication
  • 01/01/2009A comprehensive simulation tool for the analysis of password policiesRichard Shay, Elisa Bertino

    Modern organizations rely on passwords for preventing illicit access to valuable data and resources. A well designed password policy helps users create and manage more effective passwords. This paper offers a novel model and tool for understanding, creating, and testing password policies. We present a password policy simulation model which incorporates such factors as simulated users, accounts, and services. This model and its implementation enable administrators responsible for creating and managing password policies to test them before giving them to actual users. It also allows researchers to test how different password policy factors impact security, without the time and expense of actual human studies. We begin by presenting our password policy simulation model. We next discuss prior work and validate the model by showing how it is consistent with previous research conducted on human users. We then present and discuss experimental results derived using the model.

    View the Full Publication
  • 01/01/2009Specification and enforcement of flexible security policy for active cooperationYuqing Sun, Bin Gong, Xiangxu Meng, Zongkai Lin, Elisa Bertino

    Interoperation and services sharing among different systems are becoming new paradigms for enterprise collaboration. To keep ahead in strong competition environments, an enterprise should provide flexible and comprehensive services to partners and support active collaborations with partners and customers. Achieving such goals requires enterprises to specify and enforce flexible security policies for their information systems. Although the area of access control has been widely investigated, current approaches still do not support flexible security policies able to account for different weighs that typically characterize the various attributes of the requesting parties and transactions and reflect the access control criteria that are relevant for the enterprise. In this paper we propose a novel approach that addresses such flexibility requirements while at the same time reducing the complexity of security management. To support flexible policy specification, we define the notion of restraint rules for authorization management processes and introduce the concept of impact weight for the conditions in these restraint rules. We also introduce a new data structure for the encoding of the condition tree as well as the corresponding algorithm for efficiently evaluating conditions. Furthermore, we present a system architecture that implements above approach and supports interoperation among heterogeneous platforms.

    View the Full Publication
  • 01/01/2009OpenMP to GPGPU: a compiler framework for automatic translation and optimizationSeyong Lee, Seung-Jai Min, Rudolf Eigenmann

    GPGPUs have recently emerged as powerful vehicles for general-purpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from NVIDIA offers improved programmability for general computing, programming GPGPUs is still complex and error-prone. This paper presents a compiler framework for automatic source-to-source translation of standard OpenMP applications into CUDA-based GPGPU applications. The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs. In this paper, we have identified several key transformation techniques, which enable efficient GPU global memory access, to achieve high performance. Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both regular and irregular applications, leading to performance improvements of up to 50X over the unoptimized translation (up to 328X over serial).

    View the Full Publication
  • 01/01/2009FALCON: a system for reliable checkpoint recovery in shared grid environmentsTanzima Zerin Islam, Saurabh Bagchi, Rudolf Eigenmann

    In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycles with guest jobs, as long as their performance degradation is tolerable. However, unpredictable evictions of guest jobs lead to fluctuating completion times. Checkpoint-recovery is an attractive mechanism for recovering from such "failures". Today's FGCS systems often use expensive, high-performance dedicated checkpoint servers. However, in geographically distributed clusters, this may incur high checkpoint transfer latencies. In this paper we present a system called Falcon that uses available disk resources of the FGCS machines as shared checkpoint repositories. However, an unavailable storage host may lead to loss of checkpoint data. Therefore, we model failures of storage hosts and develop a prediction algorithm for choosing reliable checkpoint repositories. We experiment with Falcon in the university-wide Condor testbed at Purdue and show improved and consistent performance for guest jobs in the presence of irregular resource availability

    View the Full Publication
  • 01/01/2009Guest Editors’ Introduction - Parallel ProgrammingRudolf Eigenmann, Eduard Ayguade

    OpenMP is an Application Programming Interface (API) widely accepted as a standard for high-level shared-memory parallel programming. OpenMP is a portable, scalable programming model that provides a simple and flexible interface for developing shared-memory parallel applications in Fortran, C, and C++. Since its introduction in 1997 OpenMP has gained support from the majority of high-performance compiler and hardware vendors. There is also active research in OpenMP compilers, runtime systems, tools, and environments. Under the direction of the OpenMP Architecture Review Board (ARB), the OpenMP standard continues to evolve.

    View the Full Publication
  • 01/01/2009Scalable and Effective Test Generation for Role-Based Access Control SystemsAmmar Masood, Rafae Bhatti, Arif Ghafoor, Aditya P. Mathur

    Conformance testing procedures for generating tests from the finite state model representation of Role-Based Access Control (RBAC) policies are proposed and evaluated. A test suite generated using one of these procedures has excellent fault detection ability but is astronomically large. Two approaches to reduce the size of the generated test suite were investigated. One is based on a set of six heuristics and the other directly generates a test suite from the finite state model using random selection of paths in the policy model. Empirical studies revealed that the second approach to test suite generation, combined with one or more heuristics, is most effective in the detection of both first-order mutation and malicious faults and generates a significantly smaller test suite than the one generated directly from the finite state models.

    View the Full Publication
  • 12/01/2008Cryptanalysis of Some RFID Authentication ProtocolsTianjie Cao, Peng Shen, Elisa Bertino

    Two effective attacks, namely de-synchronization attack and impersonation attack, against Ha et al.’s LCSS RFID authentication protocol, Song and Mitchell’s protocol are identified. The former attack can break the synchronization between the RFID reader and the tag in a single protocol run so that they can not authenticate each other in any following protocol run. The latter can impersonate a legal tag to spoof the RFID reader by extracting the ID of a specific tag during the authentication process. An impersonation attack against Chen et al.’s RFID authentication scheme is also identified. By sending malicious queries to the tag and collecting the response messages emitted by the tag, the attack allows an adversary to extract the secret information from the tag and further to impersonate the legal tag.

    View the Full Publication
  • 12/01/2008A Policy-Based Accountability Tool for Grid Computing SystemsAnna Squicciarini, Wonjun Lee, Elisa Bertino, Carol Song

    The dynamic and multi-organizational nature of Grid systems requires effective and efficient accountability systems to scale for accommodating large number of users and resources. The availability of detailed and complete accountability data is crucial for both the Grid administrators and the overall Grid community. In this paper we present a layered architecture for addressing the end-to-end accountability problem. We introduce the concept of accountability agents, entities in charge of collecting accountability data, keeping track of submitted jobs and their users. We present a simple yet effective language to specify the relevant accountability data to be collected and selectively distributed by the accountability agents. Additionally, we design a decentralized and scalable approach to accountability, so to be able to monitor jobs workflow with relatively little intrusion

    View the Full Publication
  • 11/12/2008Transcriptome analysis identifies novel responses and potential regulatory genes involved in seasonal dormancy transitions of leafy spurge (Euphorbia esula L.)David P. Horvath, Wun S. Chao, Jeffrey C. Suttle, Jyothi Thimmapuram, James V. Anderson

    Dormancy of buds is a critical developmental process that allows perennial plants to survive extreme seasonal variations in climate. Dormancy transitions in underground crown buds of the model herbaceous perennial weed leafy spurge were investigated using a 23 K element cDNA microarray. These data represent the first large-scale transcriptome analysis of dormancy in underground buds of an herbaceous perennial species. Crown buds collected monthly from August through December, over a five year period, were used to monitor the changes in the transcriptome during dormancy transitions.

    View the Full Publication
  • 11/01/2008Dynamic Resource Management in Energy Constrained Heterogeneous Computing Systems Using Voltage ScalingJong-Kook Kim, Howard Siegel, Anthony Maciejewski, Rudolf Eigenmann

    An ad hoc grid is a wireless heterogeneous computing environment without a fixed infrastructure. This study considers wireless devices that have different capabilities, have limited battery capacity, support dynamic voltage scaling, and are expected to be used for eight hours at a time and then recharged. To maximize the performance of the system, it is essential to assign resources to tasks (match) and order the execution of tasks on each resource (schedule) in a manner that exploits the heterogeneity of the resources and tasks while considering the energy constraints of the devices. In the single-hop ad hoc grid heterogeneous environment considered in this study, tasks arrive unpredictably, are independent (no precedent constraints for tasks), and have priorities and deadlines. The problem is to map (match and schedule) tasks onto devices such that the number of highest priority tasks completed by their deadlines during eight hours is maximized while efficiently utilizing the overall system energy. A model for dynamically mapping tasks onto wireless devices is introduced. Seven dynamic mapping heuristics for this environment are designed and compared to each other and to a mathematical bound.

    View the Full Publication
  • 10/05/2008Development and annotation of perennial Triticeae ESTs and SSR markers.B.S. Bushman, S R Larson, I W Mott, P F Cliften, R R Wang, N J Chatterton, A G Hernandez, S Ali, R W Kim, Jyothi Thimmapuram, G Gong, L Liu, M A Mikel

    Triticeae contains hundreds of species of both annual and perennial types. Although substantial genomic tools are available for annual Triticeae cereals such as wheat and barley, the perennial Triticeae lack sufficient genomic resources for genetic mapping or diversity research. To increase the amount of sequence information available in the perennial Triticeae, three expressed sequence tag (EST) libraries were developed and annotated for Pseudoroegneria spicata, a mixture of both Elymus wawawaiensis and E. lanceolatus, and a Leymus cinereus x L. triticoides interspecific hybrid. The ESTs were combined into unigene sets of 8 780 unigenes for P. spicata, 11 281 unigenes for Leymus, and 7 212 unigenes for Elymus. Unigenes were annotated based on putative orthology to genes from rice, wheat, barley, other Poaceae, Arabidopsis, and the non-redundant database of the NCBI. Simple sequence repeat (SSR) markers were developed, tested for amplification and polymorphism, and aligned to the rice genome. Leymus EST markers homologous to rice chromosome 2 genes were syntenous on Leymus homeologous groups 6a and 6b (previously 1b), demonstrating promise for in silico comparative mapping. All ESTs and SSR markers are available on an EST information management and annotation database (http://titan.biotec.uiuc.edu/triticeae/).

    View the Full Publication
  • 10/01/2008Anonymous Geo-Forwarding in MANETs through Location CloakingXiaoxin Wu, Jun Liu, Xiaoyan Hong, Elisa Bertino

    In this paper, we address the problem of destination anonymity for applications in mobile ad hoc networks where geographic information is ready for use in both ad hoc routing and Internet services. Geographic forwarding becomes a lightweight routing protocol in favor of the scenarios. Traditionally the anonymity of an entity of interest can be achieved by hiding it among a group of other entities with similar characteristics, i.e., an anonymity set. In mobile ad hoc networks, generating and maintaining an anonymity set for any ad hoc node is challenging because of the node mobility, consequently the dynamic network topology. We propose protocols that use the destination position to generate a geographic area called {em anonymity zone (AZ)}. A packet for a destination is delivered to all the nodes in the AZ, which make up the anonymity set. The size of the anonymity set may decrease because nodes are mobile, yet the corresponding anonymity set management is simple. We design techniques to further improve node anonymity and reduce communication overhead. We use analysis and extensive simulation to study the node anonymity and routing performance, and to determine the parameters that most impact the anonymity level that can be achieved by our protocol.

    View the Full Publication
  • 10/01/2008Secure Collaboration in a Mediator-Free Distributed EnvironmentMohamed Shehab, Arif Ghafoor, Elisa Bertino

    The internet and related technologies have made multidomain collaborations a reality. Collaboration enables domains to effectively share resources; however it introduces several security and privacy challenges. Managing security in the absence of a central mediator is even more challenging. In this paper, we propose a distributed secure interoperability framework for mediator-free collaboration environments. We introduce the idea of secure access paths which enables domains to make localized access control decisions without having global view of the collaboration. We also present a path authentication technique for proving path authenticity. Furthermore, we present an on-demand path discovery algorithms that enable domains to securely discover paths in the collaboration environment. We implemented a simulation of our proposed framework and ran experiments to investigate the effect of several design parameters on our proposed access path discovery algorithm.

    View the Full Publication
  • 09/15/2008Object-Oriented DatabasesElisa Bertino, Giovanna Guerrini

    Object-oriented databases result from the integration of database technology with the object-oriented paradigm developed in the programming languages and software engineering areas. They have been developed to meet the requirements imposed by applications characterized by highly structured data, long transactions, data types for storing images and texts, and nonstandard, application-specific operations. Examples of such applications, which have requirements and characteristics different from those typical of traditional database applications for business and administration, are design and manufacturing systems (CAD/CAM, CIM), scientifical and medical databases, geographic information systems, and multimedia databases. In this article, we discuss object-oriented databases, with a focus on data model and query language aspects.

    View the Full Publication
  • 09/11/2008Proactive Role Discovery in Mediator-Free EnvironmentsMohamed Shehab, Elisa Bertino, Arif Ghafoor

    The rapid proliferation of Internet and related technologies has created tremendous possibilities for the interoperability between domains in distributed environments. Interoperability does not come easy at it opens the way for several security and privacy breaches. In this paper, we focus on the distributed authorization discovery problem that is crucial to enable secure inter- operability. We present a distributed access path discovery framework that does not require a centralized mediator. We propose and verify a role routing protocol that propagates secure, minimal-length paths to reachable roles in other domains. Finally, we present experimental results of our role routing protocol based on a simulation implementation.

    View the Full Publication
  • 09/01/2008P-CDN: Extending access control capabilities of P2P systems to provide CDN servicesFatih Turkmen, Pietro Mazzoleni, Bruno Crispo, Elisa Bertino

    New important emerging business paradigms, such as ldquoservice virtualizationrdquo can be made easy and convenient by the use of P2P systems. In these paradigms, often the owners of the services are different (and independent) from the owners of the resources used to offer such services. In comparison to centralized servers, P2P systems can conveniently offer higher availability and more bandwidth as they harness the computing and network resources of thousands of hosts in a decentralized fashion. Despite these useful features and their success in the research community, P2P systems are still not very popular in the business world. The main reason for such a skepticism is their lack of proper security. In this paper, we address this issue by motivating and explaining the benefits of adding access control in P2P systems to make them more suitable and flexible as a technical platform for providing third party services. We propose an architecture augmented with access control mechanisms to enable content delivery on a P2P system. The proposed architecture is shaped according to CDN requirements. We have also tested the feasibility of our approach with a prototype implementation and the preliminary results show that our system can scale well also in the presence of very large number of policies.

    View the Full Publication
  • 08/01/2008Structural Signatures for Tree Data StructuresAshish Kundu, Elisa Bertino

    Data sharing with multiple parties over a third-party distribution framework requires that both data integrity and confidentiality be assured. One of the most widely used data organization structures is the tree structure. When such structures encode sensitive information (such as in XML documents), it is crucial that integrity and confidentiality be assured not only for the content, but also for the structure. Digital signature schemes are commonly used to authenticate the integrity of the data. The most widely used such technique for tree structures is the Merkle hash technique, which however is known to be “not hiding”, thus leading to unauthorized leakage of information. Most techniques in the literature are based on the Merkle hash technique and thus suffer from the problem of unauthorized information leakages. Assurance of integrity and confidentiality (no leakages) of tree-structured data is an important problem in the context of secure data publishing and content distribution systems. In this paper, we propose a signature scheme for tree structures, which assures both confidentiality and integrity and is also efficient, especially in third-party distribution environments. Our integrity assurance technique, which we refer to as the “Structural signature scheme”, is based on the structure of the tree as defined by tree traversals (pre-order, post-order, in-order) and is defined using a randomized notion of such traversal numbers. In addition to formally defining the technique, we prove that it protects against violations of content and structural integrity and information leakages. We also show through complexity and performance analysis that the structural signature scheme is efficient; with respect to the Merkle hash technique, it incurs comparable cost for signing the trees and incurs lower cost for user-side integrity verification.

    View the Full Publication
  • 08/01/2008The leaf ionome as a multivariable system to detect a plant's physiological statusIvan Baxter, Olga Vitek, Brett Lahner, Balasubramaniam Muthukumar, Monica Borghi, Joe Morrissey, May Lou Guerinot, David Salt

    The contention that quantitative profiles of biomolecules contain information about the physiological state of the organism has motivated a variety of high-throughput molecular profiling experiments. However, unbiased discovery and validation of biomolecular signatures from these experiments remains a challenge. Here we show that the Arabidopsis thaliana (Arabidopsis) leaf ionome, or elemental composition, contains such signatures, and we establish statistical models that connect these multivariable signatures to defined physiological responses, such as iron (Fe) and phosphorus (P) homeostasis. Iron is essential for plant growth and development, but potentially toxic at elevated levels. Because of this, shoot Fe concentrations are tightly regulated and show little variation over a range of Fe concentrations in the environment, making them a poor probe of a plant's Fe status. By evaluating the shoot ionome in plants grown under different Fe nutritional conditions, we have established a multivariable ionomic signature for the Fe response status of Arabidopsis. This signature has been validated against known Fe-response proteins and allows the high-throughput detection of the Fe status of plants with a false negative/positive rate of 18%/16%. A “metascreen” of previously collected ionomic data from 880 Arabidopsis mutants and natural accessions for this Fe response signature successfully identified the known Fe mutants frd1 and frd3. A similar approach has also been taken to identify and use a shoot ionomic signature associated with P homeostasis. This study establishes that multivariable ionomic signatures of physiological states associated with mineral nutrient homeostasis do exist in Arabidopsis and are in principle robust enough to detect specific physiological responses to environmental or genetic perturbations.

    View the Full Publication
  • 07/01/2008Verification of Receipts from M-commerce Transactions on NFC Cellular PhonesJungha Woo, Abhilasha Bhargav-Spantzel, Anna Squicciarini, Elisa Bertino

    A main challenge in mobile commerce is to make it possible for users to manage their transaction histories from both online e-commerce transactions and in-person transactions. Such histories are typically useful to build credit or to establish trust based on past transactions. In this paper we propose an approach to manage electronic receipts on cellular devices by assuring their secure and privacy-preserving usage. We provide a comprehensive notion of transactions history including both on-line transaction and in-person transactions. We apply cryptographic protocols, such as secret sharing and zero knowledge proofs, in a potentially vulnerable and constrained setting. Specifically, our approach supports flexible strategies based on Shamir's secret sharing to cater to different user requirements and architectural constraints. In addition, aggregate zero knowledge proofs are used to efficiently support proofs of various receipt attributes. We have implemented the system on Nokia NFC cellular phones and report in the paper performance evaluation results.

    View the Full Publication
  • 06/10/2008Genes controlling plant growth habit in Leymus (Triticeae): maize barren stalk1 (ba1), rice lax panicle, and wheat tiller inhibition (tin3) genes as possible candidates.Paraminder Kaur, Steven R. Larson, B.S. Bushman, Richard R. Wang, Ivan W. Mott, Jyothi Thimmapuram, George Gong, Lei Liu

    Leymus cinereus and L. triticoides are large caespitose and rhizomatous perennial grasses, respectively. Previous studies detected quantitative trait loci (QTL) controlling rhizome spreading near the viviparous1 (vp1) gene markers on linkage groups LG3a and LG3b in two families, TTC1 and TTC2, derived from Leymus triticoides x Leymus cinereus hybrids. The wheat tiller inhibition gene (tin3) is located on Triticum monococcum chromosome 3 A(m)L near vp1. Triticeae group 3 is reportedly collinear with rice chromosome 1, which also contains the maize barren stalk1 and rice lax branching orthogene near vp1. However, previous studies lacked cross-species markers for comparative mapping and showed possible rearrangements of Leymus group 3 in wheat-Leymus racemosus chromosome addition lines. Here, we developed expressed sequence tag (EST) markers from Leymus tiller and rhizomes and mapped sequences aligned to rice chromosome 1. Thirty-eight of 44 informative markers detected loci on Leymus LG3a and LG3b that were collinear with homoeologous sequences on rice chromosome 1 and syntenous in homoeologous group 3 wheat-Leymus and wheat-Thinopyrum addition lines. A SCARECROW-like GRAS-family transcription factor candidate gene was identified in the Leymus EST library, which aligns to the Leymus chromosome group 3 growth habit QTL and a 324-kb rice chromosome 1 region thought to contain the wheat tin3 gene.

    View the Full Publication
  • 06/04/2008Gene expression in developing watermelon fruitPat Wechter, Amnon Levi, Karen Harris, Angela Davis, Zhangjun Fei, Nurit Katzir, James Giovannoni, Ayelet Salman-Minkov, Alvaro Hernandez, Jyothi Thimmapuram, Yaakov Tadmore, Vitaly Portnoy, Tova Trebitsh

    Cultivated watermelon form large fruits that are highly variable in size, shape, color, and content, yet have extremely narrow genetic diversity. Whereas a plethora of genes involved in cell wall metabolism, ethylene biosynthesis, fruit softening, and secondary metabolism during fruit development and ripening have been identified in other plant species, little is known of the genes involved in these processes in watermelon. A microarray and quantitative Real-Time PCR-based study was conducted in watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] in order to elucidate the flow of events associated with fruit development and ripening in this species. RNA from three different maturation stages of watermelon fruits, as well as leaf, were collected from field grown plants during three consecutive years, and analyzed for gene expression using high-density photolithography microarrays and quantitative PCR.

    View the Full Publication
  • 06/01/2008EXAM: An Environment for Access Control Policy Analysis and ManagementPrathima Rao, Dan Lin, Elisa Bertino, Ninghui Li, Jorge Lobo

    As distributed collaborative applications and architectures are adopting policy-based solutions for tasks such as access control, network security and data privacy, the management and consolidation of a large number of policies is becoming a crucial component of such solutions. In large-scale distributed collaborative applications like web services, there is need for analyzing policy interaction and performing policy integration. In this demonstration, we present EXAM, a comprehensive environment for policy analysis and management, which can be used to perform a variety of functions such as policy property analyses, policy similarity analysis, policy integration.Our work focuses on analysis of access control policies written in XACML (Extensible Access Control Markup Language). We consider XACML policies because XACML is a rich language able to represent many policies of interest to real world applications and is gaining widespread adoption in the industry.

    View the Full Publication
  • 05/01/2008A New Model for Secure Dissemination of XML ContentAshish Kundu, Elisa Bertino

    The paper proposes an approach to content dissemination that exploits the structural properties of an Extensible Markup Language (XML) document object model in order to provide an efficient dissemination and at the same time assuring content integrity and confidentiality. Our approach is based on the notion of encrypted postorder numbers that support the integrity and confidentiality requirements of XML content as well as facilitate efficient identification, extraction, and distribution of selected content portions. By using such notion, we develop a structure-based routing scheme that prevents information leaks in the XML data dissemination, and assures that content is delivered to users according to the access control policies, that is, policies specifying which users can receive which portions of the contents. Our proposed dissemination approach further enhances such structure-based, policy-based routing by combining it with multicast in order to achieve high efficiency in terms of bandwidth usage and speed of data delivery, thereby enhancing scalability. Our dissemination approach thus represents an efficient and secure mechanism for use in applications such as publish-subscribe systems for XML Documents. The publish-subscribe model restricts the consumer and document source information to the routers to which they register with. Our framework facilitates dissemination of contents with varying degrees of confidentiality and integrity requirements in a mix of trusted and untrusted networks, which is prevalent in current settings across enterprise networks and the Web. Also, it does not require the routers to be aware of any security policy in the sense that the routers do not need to implement any policy related to access control.

    View the Full Publication
  • 05/01/2008Efficient and Secure Content Processing and Distribution by Cooperative IntermediariesYunhua Koglin, Denfeng Yao, Elisa Bertino

    Content services such as content filtering and transcoding, adapt contents to meet system requirements, display capacities, or user preferences. Data security in such a framework is an important problem, and crucial for many web applications. In this paper, we propose an approach that addresses data integrity and confidentiality in content adaptation and caching by intermediaries. Our approach permits multiple intermediaries to simultaneously perform content services on different portions of the data. Our protocol supports decentralized proxy and key managements and flexible delegation of services. Our experimental results show that our approach is efficient and minimizes the amount of data transmitted across the network.

    View the Full Publication
  • 04/24/2008The draft genome of the transgenic tropical fruit tree papaya ( Carica papaya Linnaeus)Ray Ming, Shaobin Hou, Yun Feng, Qinguyi Yu, Jyothi Thimmapuram

    Papaya, a fruit crop cultivated in tropical and subtropical regions, is known for its nutritional benefits and medicinal applications. Here we report a 3 draft genome sequence of 'SunUp' papaya, the first commercial virus-resistant transgenic fruit tree1 to be sequenced. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease-resistance gene analogues. Comparison of the five sequenced genomes suggests a minimal angiosperm gene set of 13,311. A lack of recent genome duplication, atypical of other angiosperm genomes sequenced so far may account for the smaller papaya gene number in most functional groups. Nonetheless, striking amplifications in gene number within particular functional groups suggest roles in the evolution of tree-like habit, deposition and remobilization of starch reserves, attraction of seed dispersal agents, and adaptation to tropical daylengths. Transgenesis at three locations is closely associated with chloroplast insertions into the nuclear genome, and with topoisomerase I recognition sites. Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica's distinguishing morpho-physiological, medicinal and nutritional properties.

    View the Full Publication
  • 04/01/2008An Efficient Time-Bound Hierarchical Key Management Scheme for Secure BroadcastingElisa Bertino, Ning Shang, Samuel S. Wagstaff Jr

    In electronic subscription and pay TV systems, data can be organized and encrypted using symmetric key algorithms according to predefined time periods and user privileges, then broadcast to users. This requires an efficient way to manage the encryption keys. In this scenario, time-bound key management schemes for a hierarchy were proposed by Tzeng and Chien in 2002 and 2005, respectively. Both schemes are insecure against collusion attacks. In this paper, we propose a new key assignment scheme for access control which is both efficient and secure. Elliptic curve cryptography is deployed in this scheme. We also provide analysis of the scheme with respect to security and efficiency issues.

    View the Full Publication
  • 04/01/2008The effect of iron on the primary root elongation of Arabidopsis during phosphate deficiencyJames Ward, Brett Lahner, Elena Yakubova, David Salt

    Root architecture differences have been linked to the survival of plants on phosphate (P)-deficient soils, as well as to the improved yields of P-efficient crop cultivars. To understand how these differences arise, we have studied the root architectures of P-deficient Arabidopsis (Arabidopsis thaliana Columbia-0) plants. A striking aspect of the root architecture of these plants is that their primary root elongation is inhibited when grown on P-deficient medium. Here, we present evidence suggesting that this inhibition is a result of iron (Fe) toxicity. When the Fe concentration in P-deficient medium is reduced, we observe elongation of the primary root without an increase in P availability or a corresponding change in the expression of P deficiency-regulated genes. Recovery of the primary root elongation is associated with larger plant weights, improved ability to take up P from the medium, and increased tissue P content. This suggests that manipulating Fe availability to a plant could be a valuable strategy for improving a plant's ability to tolerate P deficiency.

    View the Full Publication
  • 04/01/2008A scalable call admission control algorithmZafar Ali, Waseem Sheikh, Edwin Chong, Arif Ghafoor

    In this paper, we propose a scalable algorithm for connection admission control (CAC). The algorithm applies to a Multiprotocol Label Switching (MPLS) ATM switch with a FIFO buffer. The switch carries data from statistically independent variable bit rate (VBR) sources that asynchronously alternate between ON and OFF states with exponentially distributed periods. The sources may be heterogeneous both in terms of their statistical characteristics (peak cell rate, sustained cell rate, and burst size attributes) as well as their Quality of Service (QoS) requirements.

    The performance of the proposed CAC scheme is evaluated using known performance bounds and simulation results. For the purpose of comparison, we also present scalability analyses for some of the previously proposed CAC schemes. Our results show that the proposed CAC scheme consistently performs better and operates the link close to the highest possible utilization level. Furthermore, the scheme scales well with increasing amount of resources (link capacity and buffer size) and accommodates intelligently the mix of traffic offered by sources of diversed burstiness characteristics.

    View the Full Publication
  • 04/01/2008A Practical and Flexible Key Management Mechanism For Trusted Collaborative ComputingXukai Zou, Yuan-Shun Dai, Elisa Bertino

    Trusted collaborative computing (TCC) is a new research and application paradigm. Two important challenges in such a context are represented by secure information transmission among the collaborating parties and selective differentiated access to data among members of collaborating groups. Addressing such challenges requires, among other things, developing techniques for secure group communication (SGQ), secure dynamic conferencing (SDC), differential access control (DIF-AC), and hierarchical access control (HAC). Cryptography and key management have been intensively investigated and widely applied in order to secure information. However, there is a lack of key management mechanisms which are general and flexible enough to address all requirements arising from information transmission and data access. This paper proposes the first holistic group key management scheme which can directly support all these functions yet retain efficiency. The proposed scheme is based on the innovative concept of access control polynomial (ACP) that can efficiently and effectively support full dynamics, flexible access control with fine-tuned granularity, and anonymity. The new scheme is immune from various attacks from both external and internal malicious parties.

    View the Full Publication
  • 04/01/2008Trust establishment in the formation of Virtual OrganizationsAnna Squicciarini, Federica Paci, Elisa Bertino

    Virtual organizations (VO) represent a new collaboration paradigm in which the participating entities pool resources, services, and information to achieve a common goal. VOs are often created on demand and dynamically evolve over time. One organization identifies a business opportunity and creates a VO to meet it. The collaboration's rules between the VO members are specified in a contract issued by the organization that creates the VO. VO represent an interesting approach for companies to achieve new and profitable business opportunities by being able to dynamically partner with others. Thus, choosing the appropriate VO partners is a crucial aspect. Ensuring trustworthiness of the members is also fundamental for making the best decisions. In this paper, we show how trust negotiation represents an effective means to select the best possible members during different stages in the VO lifecycle. We base our discussion on concrete application scenarios and illustrate the tools created by us that integrate trust negotiation with a VO management toolkit.

    View the Full Publication
  • 04/01/2008Continuous Intersection Joins Over Moving ObjectsRui Zhang, Dan Lin, Kotagiri Ramamohanarao, Elisa Bertino

    The continuous intersection join query is computationally expensive yet important for various applications on moving objects. No previous study has specifically addressed this query type. We can adopt a naive algorithm or extend an existing technique (TP-Join) to process the query. However, they compute the answer for either too long or too short a time interval, which results in either a very large computation cost per object update or too frequent answer updates, respectively. This motivates us to optimize the query processing in the time dimension. In this study, we achieve this optimization by introducing the new concept of time-constrained (TC) processing. Further, TC processing enables a set of effective improvement techniques on traditional intersection join algorithms. With a thorough experimental study, we show that our algorithm outperforms the best adapted existing solution by several orders of magnitude

    View the Full Publication
  • 04/01/2008A Hybrid Approach to Private Record LinkageAli Inan, Murat Kantarcioglu, Elisa Bertino, Maria Scannapieco

    Real-world entities are not always represented by the same set of features in different data sets. Therefore matching and linking records corresponding to the same real-world entity distributed across these data sets is a challenging task. If the data sets contain private information, the problem becomes even harder due to privacy concerns. Existing solutions of this problem mostly follow two approaches: sanitization techniques and cryptographic techniques. The former achieves privacy by perturbing sensitive data at the expense of degrading matching accuracy. The later, on the other hand, attains both privacy and high accuracy under heavy communication and computation costs. In this paper, we propose a method that combines these two approaches and enables users to trade off between privacy, accuracy and cost. Experiments conducted on real data sets show that our method has significantly lower costs than cryptographic techniques and yields much more accurate matching results compared to sanitization techniques, even when the data sets are perturbed extensively.

    View the Full Publication
  • 04/01/2008A Security Punctuation Framework for Enforcing Access Control on Streaming DataRimma Nehme, Elisa Bertino, Elke Rundensteiner

    The management of privacy and security in the context of data stream management systems (DSMS) remains largely an unaddressed problem to date. Unlike in traditional DBMSs where access control policies are persistently stored on the server and tend to remain stable, in streaming applications the contexts and with them the access control policies on the real-time data may rapidly change. A person entering a casino may want to immediately block others from knowing his current whereabouts. We thus propose a novel ";stream-centric"; approach, where security restrictions are not persistently stored on the DSMS server, but rather streamed together with the data. Here, the access control policies are expressed via security constraints (called security punctuations, or short, sps) and are embedded into data streams. The advantages of the sp model include flexibility, dynamicity and speed of enforcement. DSMSs can adapt to not only data-related but also security-related selectivities, which helps reduce the waste of resources, when few subjects have access to data. We propose a security-aware query algebra and new equivalence rules together with cost estimations to guide the security-aware query plan optimization. We have implemented the sp framework in a real DSMS. Our experimental results show the validity and the performance advantages of our sp model as compared to alternative access control enforcement solutions for DSMSs.

    View the Full Publication
  • 04/01/2008Privately Updating Suppression and Generalization based k-Anonymous DatabasesAlberto Trombetta, Wei Jiang, Elisa Bertino, Lorenzo Bossi

    Alice, owner of a k-anonymous database, needs to determine whether her database, when inserted with a tuple owned by Bob, is still k-anonymous. Suppose that Bob is not allowed to access to the database because of data confidentiality and that Alice is not allowed to read Bob's tuple due to Bob's privacy concern. Under these assumptions, this paper proposes two protocols to check whether the database inserted with a tuple is still k-anonymous, without letting Alice and Bob know the contents of the tuple and the database respectively

    View the Full Publication
  • 04/01/2008Protecting Databases from Query Flood AttacksAnna Squicciarini, Ivan Paloscia, Elisa Bertino

    A typical Denial of Service attack against a DBMS may occur through a query flood, that is, a large number of queries and/or updates sent by a malicious subject or several colluding malicious subjects to a target database with the intention to hinder other subjects from being serviced. In this paper we present experimental results showing that such attacks indeed degrade the performance of the DBMS; our experiments are conducted on several well known DBMS. We then propose some simple yet effective techniques for detecting query-flood attacks and protecting a DBMS against them.

    View the Full Publication
  • 04/01/2008Secure Delta-Publishing of XML ContentMohamed Nabeel, Elisa Bertino

    Many content distribution applications are characterized by frequent, incremental updates. Efficiency is not the only requirement in that security is also crucial for a large spectrum of applications. The goal of this work is to develop an approach for efficient and scalable dissemination of XML documents while assuring confidentiality, integrity and completeness without requiring the third-party publishers to be trusted. The key element of our approach to reduce the bandwidth requirements is to use delta messaging. Our approach takes every possible measure to minimize indirect information leakage by making the rest of the structure of XML documents to which clients do not have access oblivious. The experimental results show that our scheme is superior to conventional techniques of securing XML documents when the percentage of updates with respect to original documents is low.

    View the Full Publication
  • 04/01/2008Managing Biological Data using bdbmsMohamed Eltabakh, Mourad Ouzzani, Walid G. Aref, Ahmed Elmagarmid, Yasin Silva, Mohamed Arshad, David Salt, Ivan Baxter

    We demonstrate bdbms, an extensible database engine for biological databases, bdbms started on the observation that database technology has not kept pace with the specific requirements of biological databases and that several needed key functionalities are not supported at the engine level. While bdbms aims at supporting several of these functionalities, this demo focuses on: (1) Annotation and provenance management including storage, indexing, querying, and propagation, (2) Local dependency tracking of dependencies and derivations among data items, and (3) Update authorization to support data curation. We demonstrate how bdbms enables biologists to manipulate their databases, annotations, and derivation information in a unified database system using the Purdue ionomics information management system (PiiMS) as a case study.

    View the Full Publication
  • 03/01/2008Slc39a1 to 3 (subfamily {II)} Zip genes in mice have unique cell-specific functions during adaptation to zinc deficiencyTaiho Kambe, Jim Geiser, Brett Lahner, David Salt

    Subfamily II of the solute carrier (Slc)39a family contains three highly conserved members (ZIPs 1–3) that share a 12-amino acid signature sequence present in the putative fourth transmembrane domain and function as zinc transporters in transfected cells. The physiological significance of this genetic redundancy is unknown. Here we report that the complete elimination of all three of these Zip genes, by targeted mutagenesis and crossbreeding mice, causes no overt phenotypic effect. When mice were fed a zinc-adequate diet, several indicators of zinc status were indistinguishable between wild-type and triple-knockout mice, including embryonic morphogenesis and growth, alkaline phosphatase activity in the embryo, ZIP4 protein in the visceral yolk sac, and initial rates (30 min) of accumulation/retention of 67Zn in liver and pancreas. When mice were fed a zinc-deficient diet, embryonic membrane-bound alkaline phosphatase activity was reduced to a much greater extent, and 80% of the embryos of the triple-knockout mice developed abnormally compared with 12% of the embryos of wild-type mice. During zinc deficiency, the accumulation/retention (3 h) of 67Zn in the liver and pancreas of weanlings was significantly impaired in the triple-knockout mice compared with wild-type mice. Thus none of these three mammalian Zip genes apparently plays a critical role in zinc homeostasis when zinc is replete, but they play important, noncompensatory roles when this metal is deficient.

    View the Full Publication
  • 03/01/2008Hierarchical Domains for Decentralized Administration of Spatially-Aware RBAC SystemsMaria Luisa Damiani, Claudio Silvestri, Elisa Bertino

    Emerging models for context-aware role-based access control pose challenging requirements over policy administration. In this paper we address the issues raised by the decentralized administration of a spatially-aware access control model in a mobile setting. We present GEO-RBAC Admin, the administration model for the GEO-RBAC model. The model is based on the notion of hierarchy of spatial domains; a spatial domain is an entity grouping objects based on organizational and spatial proximity criteria. In the paper we formally define the model and introduce and prove relevant properties.

    View the Full Publication
  • 02/01/2008XACML Policy Integration AlgorithmsPietro Mazzoleni, Bruno Crispo, Swaminathan Sivasubramanian, Elisa Bertino

    XACML is the OASIS standard language specifically aimed at the specification of authorization policies. While XACML fits well with the security requirements of a single enterprise (even if large and composed by multiple departments), it does not address the requirements of virtual enterprises in which several autonomous subjects collaborate by sharing their resources to provide better services to customers. In this article we highlight such limitation, and we propose an XACML extension, the policy integration algorithms, to address them. In the article we also present the implementation of a system that makes use of the policy integration algorithms to securely replicate information in a P2P-like environment. In our solution, the data replication process considers the policies specified by both the owners of the data shared and the peers sharing data storage.

    View the Full Publication
  • 01/01/2008Continuous Queries in Spatio-temporal DatabasesXiaopeng Xiong, Mohamed Mokbel, Walid G. Aref

    A continuous query is a new query type that is issued once and is evaluated continuously in a database server until the query is explicitly terminated. The most important characteristic of continuous queries is that their query result does not only depend on the present data in the databases but also on continuously arriving data. During the execution of a continuous query, the query result is updated continuously when new data arrives. Continuous queries are essential to applications that are interested in transient and frequently updated objects and require monitoring query results continuously. Potential applications of continuous queries include but are not limited to real-time location-aware services, network flow monitoring, online data analysis and sensor networks.

    View the Full Publication
  • 01/01/2008Spatio-temporal DatabaseXiaopeng Xiong, Mohamed Mokbel, Walid G. Aref

    A spatiotemporal database is a new type of database system that manages spatiotemporal objects and supports corresponding query functionalities. A spatiotemporal object is a kind of object that dynamically updates spatial locations and/or extents along with time. A typical example of a spatiotemporal object is a moving object (e.g., a car, a flight or a pedestrian) whose location continuously changes. Spatiotemporal databases have many important applications such as Geographic Information Systems, Location-aware Systems, Traffic Monitoring Systems, and Environmental Information Systems. Due to their importance, spatiotemporal database systems are very actively researched in the database domain. Interested readers are referred to [1] for a detailed survey of spatiotemporal Databases.

    View the Full Publication
  • 01/01/2008Space-Filling CurvesMohamed Mokbel, Walid G. Aref

    A space-filling curve (SFC) is a way of mapping a multi-dimensional space into a one-dimensional space. It acts like a thread that passes through every cell element (or pixel) in the multi-dimensional space so that every cell is visited exactly once. Thus, a space-filling curve imposes a linear order of points in the multi-dimensional space. A D-dimensional space-filling curve in a space of N cells (pixels) of each dimension consists of ND −1 segments where each segment connects two consecutive D-dimensional points. There are numerous kinds of space filling curves (e. g., Hilbert, Peano, and Gray). The difference between such curves is in their way of mapping to the one-dimensional space, i. e., the order that a certain space-filling curve traverses the multi-dimensional space. The quality of a space-filling curve is measured by its ability to preserve the locality (or relative distance) of multi-dimensional points in the mapped one-dimensional space. The main idea is that any two D-dimensional points that are close by in the D-dimensional space should be also close by in the one-dimensional space.

    View the Full Publication
  • 01/01/2008Data management challenges for computational transportationWalid G. Aref, Mourad Ouzzani

    Computational Transportation is an emerging discipline that poses many data management challenges. Computational transportation is characterized by the existence of a massive number of moving objects, moving sensors, and moving queries. This paper highlights important data management challenges for computational transportation and promising approaches towards addressing them.

    View the Full Publication
  • 01/01/2008The SBC-tree: an index for run-length compressed sequencesMohamed Eltabakh, Wing-Kai Hon, Rahul Shah, Walid G. Aref, Jeffrey S. Vitter

    Run-Length-Encoding (RLE) is a data compression technique that is used in various applications, e.g., time series, biological sequences, and multimedia databases. One of the main challenges is how to operate on (e.g., index, search, and retrieve) compressed data without decompressing it. In this paper, we introduce the String B-tree for Compressed sequences, termed the SBC-tree, for indexing and searching RLE-compressed sequences of arbitrary length. The SBCtree is a two-level index structure based on the well-known String B-tree and a 3-sided range query structure [7]. The SBC-tree supports pattern matching queries such as substring matching, prefix matching, and range search operations over RLE-compressed sequences. The SBC-tree has an optimal external-memory space complexity of O(N=B) pages, where N is the total length of the compressed sequences, and B is the disk page size. Substring matching, prefix matching, and range search execute in an optimal O(logB N + jpj+T B ) I/O operations, where jpj is the length of the compressed query pattern and T is the query output size. The SBC-tree is also dynamic and supports insert and delete operations efficiently. The insertion and deletion of all suffixes of a compressed sequence of length m take O(mlogB(N + m)) amortized I/O operations. The SBC-tree index is realized inside PostgreSQL. Performance results illustrate that using the SBC-tree to index RLEcompressed sequences achieves up to an order of magnitude reduction in storage, while retains the optimal search performance achieved by the String B-tree over the uncompressed sequences.

    View the Full Publication
  • 01/01/2008Detecting anomalous access patterns in relational databasesAshish Kamra, Evimaria Terzi, Elisa Bertino

    A considerable effort has been recently devoted to the development of Database Management Systems (DBMS) which guarantee high assurance and security. An important component of any strong security solution is represented by Intrusion Detection (ID) techniques, able to detect anomalous behavior of applications and users. To date, however, there have been few ID mechanisms proposed which are specifically tailored to function within the DBMS. In this paper, we propose such a mechanism. Our approach is based on mining SQL queries stored in database audit log files. The result of the mining process is used to form profiles that can model normal database access behavior and identify intruders. We consider two different scenarios while addressing the problem. In the first case, we assume that the database has a Role Based Access Control (RBAC) model in place. Under a RBAC system permissions are associated with roles, grouping several users, rather than with single users. Our ID system is able to determine role intruders, that is, individuals while holding a specific role, behave differently than expected. An important advantage of providing an ID technique specifically tailored to RBAC databases is that it can help in protecting against insider threats. Furthermore, the existence of roles makes our approach usable even for databases with large user population. In the second scenario, we assume that there are no roles associated with users of the database. In this case, we look directly at the behavior of the users. We employ clustering algorithms to form concise profiles representing normal user behavior. For detection, we either use these clustered profiles as the roles or employ outlier detection techniques to identify behavior that deviates from the profiles. Our preliminary experimental evaluation on both real and synthetic database traces shows that our methods work well in practical situations.

    View the Full Publication
  • 01/01/2008Spatial Domains for the Administration of Location-based Access Control PoliciesMaria Luisa Damiani, Elisa Bertino, Claudio Silvestri

    In the last few years there has been an increasing interest for a novel category of access control models known as location-based or spatially-aware role-based access control (RBAC) models. Those models advance classical RBAC models in that they regulate the access to sensitive resources based on the position of mobile users. An issue that has not yet been investigated is how to administer spatially-aware access control policies. In this paper we introduce GEO-RBAC Admin, the administration model for the location-based GEO-RBAC model. We discuss the concepts underlying such administrative model and present a language for the specification of GEO-RBAC policies.

    View the Full Publication
  • 01/01/2008An Access-Control Framework for WS-BPELFederica Paci, Elisa Bertino, Jason Crampton

    Business processes, the next-generation workflows, have attracted considerable research interest in the last 15 years. More recently, several XML-based languages have been proposed for specifying and orchestrating business processes, resulting in the WS-BPEL language. Even if WS-BPEL has been developed to specify automated business processes that orchestrate activities of multiple Web services, there are many applications and situations requiring that people be considered as additional participants who can influence the execution of a process. Significant omissions from WS-BPEL are the specification of activities that require interactions with humans to be completed, called human activities, and the specification of authorization information associating users with human activities in a WS-BPEL business process and authorization constraints, such as separation of duty, on the execution of human activities. In this article, we address these deficiencies by introducing a new type of WS-BPEL activity to model human activities and by developing RBAC-WS-BPEL, a role-based access-control model for WS-BPEL, and BPCL, a language to specify authorization constraints.

    View the Full Publication
  • 01/01/2008Watermarking Relational Databases Using Optimization-Based TechniquesMohamed Shehab, Elisa Bertino, Arif Ghafoor

    Proving ownership rights on outsourced relational databases is a crucial issue in today internet-based application environment and in many content distribution applications. In this paper, we present a mechanism for proof of ownership based on the secure embedding of a robust imperceptible watermark in relational data. We formulate the watermarking of relational databases as a constrained optimization problem, and discuss efficient techniques to solve the optimization problem and to handle the constraints. Our watermarking technique is resilient to watermark synchronization errors because it uses a partitioning approach that does not require marker tuples. Our approach overcomes a major weakness in previously proposed watermarking techniques. Watermark decoding is based on a threshold-based technique characterized by an optimal threshold that minimizes the probability of decoding errors. We implemented a proof of concept implementation of our watermarking technique and showed by experimental results that our technique is resilient to tuple deletion, alteration and insertion attacks.

    View the Full Publication
  • 01/01/2008Community-Cyberinfrastructure-Enabled Discovery in Science and EngineeringAhmed Elmagarmid, Arjmand Samuel, Mourad Ouzzani

    A community cyberinfrastructure would enable a new era of multidisciplinary research and collaboration in science and engineering. With such an infrastructure, researchers could share knowledge and results along with computing cycles, storage, and bandwidth. A generic, transparent cyberinfrastructure would also foster more meaningful analyses of data and visualization, modeling, and simulation of real-world phenomena.

    View the Full Publication
  • 01/01/2008Determination of the Limiting Oil Viscosity for Chemical Dispersion at SeaKevin Cokomb, David Salt, Malcolm Peddar, Alun Lewis

    Many previous studies using laboratory test methods have shown that the ability to disperse spilled oils depends on several factors including: spilled oil properties (and how these change with oil weathering), the mixing energy, and the dispersant-to-oil ratio (DOR). There appears to be a ‘limiting oil viscosity’ value that, when exceeded, causes a sharp reduction in the effectiveness of a dispersant. The results obtained in laboratory tests are relative and not absolute, and it has therefore proved very difficult to correlate dispersant effectiveness results from these laboratory tests with dispersant performance at sea. A series of small-scale dispersant tests were conducted at sea in the English Channel in June 2003. Several small test slicks of residual fuel oils of different viscosity grades were laid on the sea and immediately sprayed with different dispersants at different DORs. Observers used a simple ranking system to visually assess the degree of dispersion that occurred when a cresting wave passed through an area of the dispersant-treated oil. Collation of the results showed that there were obvious and consistent differences in the degree of effectiveness observed with different combinations of oil viscosity, dispersant and treatment rate.

    View the Full Publication
  • 01/01/2008Future directions in multimedia retrieval: impact of new technologyAidong Zhang, Shih-Fu Chang, Arif Ghafoor, Thomas Huang, Ramesh Jain, Rainer Lienhart

    PANEL SUMMARY With experts from the multimedia research communities, this panel explores the impacts of new technology on possible future directions of research on multimedia retrieval. Recent advances in technology, such as various sensors and all kinds of cameras ranging from installed on personal phones to surveillance equipments, have contributed to the acceleration of the accumulation of various multimedia data. In addition, advances in technology in storage, networking, and web designs have made these data easily accessible. However, increasingly fast and extensive applications of these technologies in science, industry, education, entertainment, and art have also created new needs for more sophisticated tools for more efficient searching and browsing of multimedia data. We are, therefore, facing a challenge: on the one hand, the traditional retrieval techniques must now be updated into more advanced search and browsing tools that can impact users in the real-world, and on the other hand, we find ourselves, once again, in the position to envision and to provide innovative approaches in the management and the retrieval of these fastgrowing data of massive quantities. This panel discusses new research issues and problems on the topics of content analysis, user interaction, content description and indexing, and evaluation for the purpose of developing nextgeneration tools of ever more advanced applications in multimedia search and browsing.

    View the Full Publication
  • 01/01/2008Genetic and physiological basis of adaptive salt tolerance divergence between coastal and inland Mimulus guttatusDavid Lowry, Megan Hall, David Salt, John Willis

    Local adaptation is a well-established phenomenon whereby habitat-mediated natural selection drives the differentiation of populations. However, little is known about how specific traits and loci combine to cause local adaptation. Here, we conducted a set of experiments to determine which physiological mechanisms contribute to locally adaptive divergence in salt tolerance between coastal perennial and inland annual ecotypes of Mimulus guttatus. Quantitative trait locus (QTL) mapping was used to discover loci involved in salt spray tolerance and leaf sodium (Na(+)) concentration. To determine whether these QTLs confer fitness in the field, we examined their effects in reciprocal transplant experiments using recombinant inbred lines (RILs). Coastal plants had constitutively higher leaf Na(+) concentrations and greater levels of tissue tolerance, but no difference in osmotic stress tolerance. Three QTLs contributed to salt spray tolerance and two QTLs to leaf Na(+) concentration. All three salt-spray tolerance QTLs had a significant fitness effects at the coastal field site but no effects inland. Leaf Na(+) QTLs had no detectable fitness effects in the field. * Physiological results are consistent with adaptation of coastal populations to salt spray and soil salinity. Field results suggest that there may not be trade-offs across habitats for alleles involved in local salt spray adaptations.

    View the Full Publication
  • 01/01/2008Phytoremediation of metalsDavid Salt, Alan Baker

    Introduction: For phytoextraction to be a viable alternative to existing soil remediation strategies it will require the existence of high biomass, rapidly growing metal-accumulating plants. It is also of critical importance that the concentration of metal in the harvestable plant tissue be higher than in the soil. This will ensure that the volume of contaminated plant material generated by the phytoextraction processes is less than the original volume of the contaminated soil. Unfortunately, plants do not exist at present which have these desirable characteristics. To generate this type of plant requires detailed information on the rate-limiting steps in the phytoextraction process.

    View the Full Publication
  • 01/01/2008Usage-Based Schema MatchingHazem Elmeleegy, Mourad Ouzzani, Ahmed Elmagarmid

    Existing techniques for schema matching are classified as either schema-based, instance-based, or a combination of both. In this paper, we define a new class of techniques, called usage-based schema matching. The idea is to exploit information extracted from the query logs to find correspondences between attributes in the schemas to be matched. We propose methods to identify co-occurrence patterns between attributes in addition to other features such as their use in joins and with aggregate functions. Several scoring functions are considered to measure the similarity of the extracted features, and a genetic algorithm is employed to find the highest-score mappings between the two schemas. Our technique is suitable for matching schemas even when their attribute names are opaque. It can further be combined with existing techniques to obtain more accurate results. Our experimental study demonstrates the effectiveness of the proposed approach and the benefit of combining it with other existing approaches.

    View the Full Publication
  • 01/01/2008Context-Aware Adaptation of Access-Control PoliciesArjmand Samuel, Arif Ghafoor, Elisa Bertino

    Today, public-service delivery mechanisms such as hospitals, police, and fire departments rely on digital generation, storage, and analysis of vital information. To protect critical digital resources, these organizations employ access-control mechanisms, which define rules under which authorized users can access the resources they need to perform organizational tasks. Natural or man-made disasters pose a unique challenge, whereby previously defined constraints can potentially debilitate an organization's ability to act. Here, the authors propose employing contextual parameters - specifically, activity context in the form of emergency warnings - to adapt access-control policies according to a priori configuration.

    View the Full Publication
  • 01/01/2008Formal foundations for hybrid hierarchies in GTRBACJames Joshi, Elisa Bertino, Arif Ghafoor, Yue Zhang

    A role hierarchy defines permission acquisition and role-activation semantics through role--role relationships. It can be utilized for efficiently and effectively structuring functional roles of an organization having related access-control needs. The focus of this paper is the analysis of hybrid role hierarchies in the context of the generalized temporal role-based access control (GTRBAC) model that allows specification of a comprehensive set of temporal constraints on role, user-role, and role-permission assignments. We introduce the notion of uniquely activable set (UAS) associated with a role hierarchy that indicates the access capabilities of a user resulting from his membership to a role in the hierarchy. Identifying such a role set is essential, while making an authorization decision about whether or not a user should be allowed to activate a particular combination of roles in a single session. We formally show how UAS can be determined for a hybrid hierarchy. Furthermore, within a hybrid hierarchy, various hierarchical relations may be derived between an arbitrary pair of roles. We present a set of inference rules that can be used to generate all the possible derived relations that can be inferred from a specified set of hierarchical relations and show that it is sound and complete. We also present an analysis of hierarchy transformations with respect to role addition, deletion, and partitioning, and show how various cases of these transformations allow the original permission acquisition and role-activation semantics to be managed. The formal results presented here provide a basis for developing efficient security administration and management tools.

    View the Full Publication
  • 01/01/2008An Approach to Evaluate Data Trustworthiness Based on Data ProvenanceChenyun Dai, Dan Lin, Elisa Bertino, Murat Kantarcioglu

    Today, with the advances of information technology, individual people and organizations can obtain and process data from different sources. It is critical to ensure data integrity so that effective decisions can be made based on these data. An important component of any solution for assessing data integrity is represented by techniques and tools to evaluate the trustworthiness of data provenance. However, few efforts have been devoted to investigate approaches for assessing how trusted the data are, based in turn on an assessment of the data sources and intermediaries. To bridge this gap, we propose a data provenance trust model which takes into account various factors that may affect the trustworthiness and, based on these factors, assigns trust scores to both data and data providers. Such trust scores represent key information based on which data users may decide whether to use the data and for what purposes.

    View the Full Publication
  • 01/01/2008Responding to Anomalous Database RequestsAshish Kamra, Elisa Bertino, Rimma Nehme

    Organizations have recently shown increased interest in database activity monitoring and anomaly detection techniques to safeguard their internal databases. Once an anomaly is detected, a response from the database is needed to contain the effects of the anomaly. However, the problem of issuing an appropriate response to a detected database anomaly has received little attention so far. In this paper, we propose a framework and policy language for issuing a response to a database anomaly based on the characteristics of the anomaly. We also propose a novel approach to dynamically change the state of the access control system in order to contain the damage that may be caused by the anomalous request. We have implemented our mechanisms in PostgreSQL and in the paper we discuss relevant implementation issues. We have also carried out an experimental evaluation to assess the performance overhead introduced by our response mechanism. The experimental results show that the techniques are very efficient.

    View the Full Publication
  • 01/01/2008ARUBA: A Risk-Utility-Based Algorithm for Data DisclosureMohamed Fouad, Guy Lebanon, Elisa Bertino

    Dealing with sensitive data has been the focus of much of recent research. On one hand data disclosure may incur some risk due to security breaches, but on the other hand data sharing has many advantages. For example, revealing customer transactions at a grocery store may be beneficial when studying purchasing patterns and market demand. However, a potential misuse of the revealed information may be harmful due to privacy violations. In this paper we study the tradeoff between data disclosure and data retention. Specifically, we address the problem of minimizing the risk of data disclosure while maintaining its utility above a certain acceptable threshold. We formulate the problem as a discrete optimization problem and leverage the special monotonicity characteristics for both risk and utility to construct an efficient algorithm to solve it. Such an algorithm determines the optimal transformations that need to be performed on the microdata before it gets released. These optimal transformations take into account both the risk associated with data disclosure and the benefit of it (referred to as utility). Through extensive experimental studies we compare the performance of our proposed algorithm with other date disclosure algorithms in the literature in terms of risk, utility, and time. We show that our proposed framework outperforms other techniques for sensitive data disclosure.

    View the Full Publication
  • 01/01/2008Security and privacy for geospatial data: concepts and research directionsElisa Bertino, Bhavani Thuraisingham, Michael Gertz, Maria Luisa Damiani

    Geospatial data play a key role in a wide spectrum of critical data management applications, such as disaster and emergency management, environmental monitoring, land and city planning, and military operations, often requiring the coordination among diverse organizations, their data repositories, and users with different responsibilities. Although a variety of models and techniques are available to manage, access and share geospatial data, very little attention has been paid to addressing security concerns, such as access control, security and privacy policies, and the development of secure and in particular interoperable GIS applications. The objective of this paper is to discuss the technical challenges raised by the unique requirements of secure geospatial data management and to suggest a comprehensive framework for security and privacy for geospatial data and GIS. Such a framework is the first coherent architectural approach to the problem of security and privacy for geospatial data.

    View the Full Publication
  • 01/01/2008Position transformation: a location privacy protection method for moving objectsDan Lin, Elisa Bertino, Reynold Cheng, Sunil Prabhakar

    The expanding use of location-based services has profound implications on the privacy of personal information. In this paper, we propose a framework for preserving location privacy based on the idea of sending to the service provider suitably modified location information. Agents execute data transformation and the service provider directly processes the transformed dataset. Our technique not only prevents the service provider from knowing the exact locations of users, but also protects information about user movements and locations from being disclosed to other users who are not authorized to access this information. We also define a privacy model to analyze our framework, and examine our approach experimentally.

    View the Full Publication
  • 01/01/2008VeryIDX - A Digital Identity Management System for Pervasive Computing EnvironmentsFederica Paci, Elisa Bertino, Sam Kerr, Aaron Lint, Anna Squicciarini, Jungha Woo

    The problem of identity theft, that is, the act of impersonating others’ identities by presenting stolen identifiers or proofs of identities, has been receiving increasing attention because of its high financial and social costs. In this paper we address such problem by proposing an approach to manage user identity attributes by assuring their privacy-preserving usage. The approach is based on the concept of privacy preserving multi-factor authentication achieved by a new cryptographic primitive which uses aggregate signatures on commitments that are then used for aggregate zero-knowledge proof of knowledge (ZKPK) protocols. We present the implementation of such approach on Nokia NFC cellular phones and report performance evaluation results.

    View the Full Publication
  • 01/01/2008Mining roles with semantic meaningsIan Molloy, Hong Chen, Tiancheng Li, Elisa Bertino, Seraphin Calo, Jorge Lobo

    With the growing adoption of role-based access control (RBAC) in commercial security and identity management products, how to facilitate the process of migrating a non-RBAC system to an RBAC system has become a problem with significant business impact. Researchers have proposed to use data mining techniques to discover roles to complement the costly top-down approaches for RBAC system construction. A key problem that has not been adequately addressed by existing role mining approaches is how to discover roles with semantic meanings. In this paper, we study the problem in two settings with different information availability. When the only information is user-permission relation, we propose to discover roles whose semantic meaning is based on formal concept lattices. We argue that the theory of formal concept analysis provides a solid theoretical foundation for mining roles from userpermission relation. When user-attribute information is also available, we propose to create roles that can be explained by expressions of user-attributes. Since an expression of attributes describes a real-world concept, the corresponding role represents a real-world concept as well. Furthermore, the algorithms we proposed balance the semantic guarantee of roles with system complexity. Our experimental results demonstrate the effectiveness of our approaches.

    View the Full Publication
  • 01/01/2008An obligation model bridging access control policies and privacy policiesQun Ni, Elisa Bertino, Jorge Lobo

    In this paper, we present a novel obligation model for the Core Privacy-aware Role Based Access Control (P-RBAC), and discuss some design issues in detail. Pre-obligations, post-obligations, conditional obligations, and repeating obligations are supported by the obligation model. Interaction between permissions and obligations is discussed, and efficient algorithms are provided to detect undesired effects.

    View the Full Publication
  • 01/01/2008Policy decomposition for collaborative access controlDan Lin, Prathima Rao, Elisa Bertino, Ninghui Li, Jorge Lobo

    With the advances in web service techniques, new collaborative applications have emerged like supply chain arrangements and coalition in government agencies. In such applications, the collaborating parties are responsible for managing and protecting resources entrusted to them. Access control decisions thus become a collaborative activity in which a global policy must be enforced by a set of collaborating parties without compromising the autonomy or confidentiality requirements of these parties. Unfortunately, none of the conventional access control systems meets these new requirements. To support collaborative access control, in this paper, we propose a novel policy-based access control model. Our main idea is based on the notion of policy decomposition and we propose an extension to the reference architecture for XACML. We present algorithms for decomposing a global policy and efficiently evaluating requests.

    View the Full Publication
  • 01/01/2008Database Intrusion Detection and ResponseAshish Kamra, Elisa Bertino

    Why is it important to have an intrusion detection (ID) mechanism tailored for a database management system (DBMS)? There are three main reasons for this. First, actions deemed malicious for a DBMS are not necessarily malicious for the underlying operating system or the network; thus ID systems designed for the latter may not be effective against database attacks. Second, organizations have stepped up data vigilance driven by various government regulations concerning data management such as SOX, GLBA, HIPAA and so forth. Third, and this is probably the most relevant reason, the problem of insider threats is being recognized as a major security threat; its solution requires among other techniques the adoption of mechanisms able to detect access anomalies by users internal to the organization owning the data.

    View the Full Publication
  • 01/01/2008Automatic Compliance of Privacy Policies in Federated Digital Identity ManagementAnna Squicciarini, Marco Casassa Mont, Abhilasha Bhargav-Spantzel, Elisa Bertino

    Privacy in the digital world is an important problem which is becoming even more pressing as new collaborative applications are developed. The lack of privacy preserving mechanisms is particularly problematic in federated identity management contexts. In such a context, users can seamlessly interact with a variety of federated web services, through theuse of single-sign-on mechanisms and the capability of sharing personal data among these web services. We argue that comprehensive privacy policies should be stated by federated service providers and proactively checked by these providers, before disclosing users’ data to federated partners. To address such requirements, we introduce mechanisms and algorithms for policy compliance checking between federated service providers, based on an innovative policy subsumption approach. We formally introduce and analyze our approach.

    View the Full Publication
  • 01/01/2008Authorization and User Failure Resiliency for WS-BPEL Business ProcessesFederica Paci, Rodolfo Ferrini, Yuqing Sun, Elisa Bertino

    We investigate the problem of WS-BPEL processes resiliency in RBAC-WS-BPEL, an authorization model for WS-BPEL that supports the specification of authorizations for the execution of WS-BPEL process activities by roles and users and authorization constraints, such as separation and binding of duty. The goal of resiliency is to guarantee that even if some users becomes unavailable during the execution of a WS-BPEL process, the remaining users can still complete the execution of the process. We extend RBAC-WS-BPEL with a new type of constraints called resiliency constraints and the notion of user failure resiliency for WS-BPEL processes and propose an algorithm to determine if a WS-BPEL process is user failure resilient.

    View the Full Publication
  • 01/01/2008Multigranular spatio-temporal models: implementation challengesElena Camossi, Michela Bertolotto, Elisa Bertino

    Multiple granularities provide an essential support for extracting significant knowledge from spatio-temporal datasets at different levels of details. They enable to zoom-in and zoom-out spatio-temporal datasets, thus enhancing the data modelling exibility and improving the analysis of information. In this paper we investigate the implementation issues arising when a data model and a query language are enriched with spatio-temporal multigranularity. We introduce appropriate representations for space and time dimensions, granularities, granules, and multi-granular values. Finally, we discuss how multigranular spatio-temporal conversions affect data usability and how such important property may be guaranteed.

    View the Full Publication
  • 01/01/2008Identity-based long running negotiationsAnna Squicciarini, Alberto Trombetta, Elisa Bertino, Stefano Braghin

    Identity based negotiations are convenient protocols to closely control users' personal data, that empower users to negotiate the trust of unknown counterparts by carefully governing the disclosure of their identities. Such type of negotiations presents, however, unique challenges, mainly caused by the way identity attributes are distributed and managed. In this paper we present novel approach for conducting long running negotiations in the context of digital identity management systems. We propose some major extensions to an existing trust negotiation protocol to support negotiations that are conducted during multiple sessions. To the best of our knowledge, this is the first time a protocol for conducting trust negotiations over multiple sessions is presented.

    View the Full Publication
  • 01/01/2008Minimal credential disclosure in trust negotiationsFederica Paci, David Bauer, Elisa Bertino, Douglas Blough, Anna Squicciarini

    The secure release of identity attributes is a key enabler for electronic business interactions. Integrity and confidentiality of identity attributes are two key requirements in such context. Users should also have the maximum control possible over the release of their identity attributes and should state under which conditions these attributes can be disclosed. Moreover, users should disclose only the identity attributes that are actually required for the transactions at hand. In this paper we present an approach for the controlled release of identity attributes that addresses such requirements. The approach is based on the integration of trust negotiation and minimal credential disclosure techniques. Trust negotiation supports selective and incremental disclosure of identity attributes, while minimal credential disclosure guarantees that only the attributes necessary to complete the on line interactions are disclosed.

    View the Full Publication
  • 01/01/2008Querying Multigranular Spatio-temporal ObjectsElena Camossi, Michela Bertolotto, Elisa Bertino

    The integrated management of both spatial and temporal components of information is crucial in order to extract significant knowledge from datasets concerning phenomena of interest to a large variety of applications. Moreover, multigranularity, i.e., the capability of representing information at different levels of detail, enhances the data modelling flexibility and improves the analysis of information, enabling to zoom-in and zoom-out spatio-temporal datasets. Relying on an existing multigranular spatio-temporal extension of the ODMG data model, in this paper we describe the design of a multigranular spatio-temporal query language. We extend OQL value comparison and object navigation in order to access spatio-temporal objects with attribute values defined at different levels of detail.

    View the Full Publication
  • 01/01/2008A Federated Digital Identity Management Approach for Business ProcessesElisa Bertino, Rodolfo Ferrini, Andrea Musci, Federica Paci, Kevin Steuer Jr

    Business processes have gained a lot of attention because of the pressing need for integrating existing resources and services to better fulfill customer needs. A key feature of business processes is that they are built from composable services, referred to as component services, that may belong to different domains. In such a context, flexible multi-domain identity management solutions are crucial for increased security and user-convenience. In particular, it is important that during the execution of a business process the component services be able to verify the identity of the client to check that it has the required permissions for accessing the services. To address the problem of multi-domain identity management, we propose a multi-factor identity attribute verification protocol for business processes that assures clients privacy and handles naming heterogeneity.

    View the Full Publication
  • 01/01/2008High-Assurance Integrity Techniques for DatabasesElisa Bertino, Chenyun Dai, Hyo-Sang Lim, Dan Lin

    With the increased need of data sharing among multiple organizations, such as government organizations, financial corporations, medical hospitals and academic institutions, it is critical to ensure data integrity so that effective decisions can be made based on these data. In this paper, we first present an architecture for a comprehensive integrity control system based on data validation and metadata management. We then discuss an important issue in the data validation, that is, the evaluation of data provenance and propose a trust model for estimating the trustworthiness of data and data providers. By taking into account confidence about data provenance, we introduce an approach for policy observing query evaluation as a complement to the integrity control system.

    View the Full Publication
  • 01/01/2008The SBC-tree: an index for run-length compressed sequencesMohamed Eltabakh, Wing-Kai Hon, Rahul Shah, Walid G. Aref, Jeffrey S. Vitter

    Run-Length-Encoding (RLE) is a data compression technique that is used in various applications, e.g., time series, biological sequences, and multimedia databases. One of the main challenges is how to operate on (e.g., index, search, and retrieve) compressed data without decompressing it. In this paper, we introduce the String B-tree for Compressed sequences, termed the SBC-tree, for indexing and searching RLE-compressed sequences of arbitrary length. The SBC-tree is a two-level index structure based on the well-known String B-tree and a 3-sided range query structure [7]. The SBC-tree supports pattern matching queries such as substring matching, prefix matching, and range search operations over RLE-compressed sequences. The SBC-tree has an optimal external-memory space complexity of O(N/B) pages, where N is the total length of the compressed sequences, and B is the disk page size. Substring matching, prefix matching, and range search execute in an optimal O(logB N + |p|+T/B) I/O operations, where |p| is the length of the compressed query pattern and T is the query output size. The SBC-tree is also dynamic and supports insert and delete operations efficiently. The insertion and deletion of all suffixes of a compressed sequence of length m take O(m logB(N + m)) amortized I/O operations. The SBC-tree index is realized inside PostgreSQL. Performance results illustrate that using the SBC-tree to index RLE-compressed sequences achieves up to an order of magnitude reduction in storage, while retains the optimal search performance achieved by the String B-tree over the uncompressed sequences.

    View the Full Publication
  • 01/01/2008Data management challenges for computational transportationWalid G. Aref, Mourad Ouzzani

    Computational Transportation is an emerging discipline that poses many data management challenges. Computational transportation is characterized by the existence of a massive number of moving objects, moving sensors, and moving queries. This paper highlights important data management challenges for computational transportation and promising approaches towards addressing them.

    View the Full Publication
  • 01/01/2008Space-Filling CurvesMohamed Mokbel, Walid G. Aref

    A space-filling curve (SFC) is a way of mapping a multi-dimensional space into a one-dimensional space. It acts like a thread that passes through every cell element (or pixel) in the multi-dimensional space so that every cell is visited exactly once. Thus, a space-filling curve imposes a linear order of points in the multi-dimensional space. A D

    -dimensional space-filling curve in a space of N cells (pixels) of each dimension consists of ND − 1 segments where each segment connects two consecutive D

    -dimensional points. There are numerous kinds of space filling curves (e. g., Hilbert, Peano, and Gray). The difference between such curves is in their way of mapping to the one-dimensional space, i. e., the order that a certain space-filling curve traverses the multi-dimensional space. The quality of a space-filling curve is measured by its ability to preserve the locality (or relative distance) of multi-dimensional points in the mapped one-dimensional space. The main idea is that any two D -dimensional points that are close by in the D

    -dimensional space should be also close by in the one-dimensional space.

    View the Full Publication
  • 01/01/2008Detecting anomalous access patterns in relational databasesAshish Kamra, Evimaria Terzi, Elisa Bertino

    A considerable effort has been recently devoted to the development of Database Management Systems (DBMS) which guarantee high assurance and security. An important component of any strong security solution is represented by Intrusion Detection (ID) techniques, able to detect anomalous behavior of applications and users. To date, however, there have been few ID mechanisms proposed which are specifically tailored to function within the DBMS. In this paper, we propose such a mechanism. Our approach is based on mining SQL queries stored in database audit log files. The result of the mining process is used to form profiles that can model normal database access behavior and identify intruders. We consider two different scenarios while addressing the problem. In the first case, we assume that the database has a Role Based Access Control (RBAC) model in place. Under a RBAC system permissions are associated with roles, grouping several users, rather than with single users. Our ID system is able to determine role intruders, that is, individuals while holding a specific role, behave differently than expected. An important advantage of providing an ID technique specifically tailored to RBAC databases is that it can help in protecting against insider threats. Furthermore, the existence of roles makes our approach usable even for databases with large user population. In the second scenario, we assume that there are no roles associated with users of the database. In this case, we look directly at the behavior of the users. We employ clustering algorithms to form concise profiles representing normal user behavior. For detection, we either use these clustered profiles as the roles or employ outlier detection techniques to identify behavior that deviates from the profiles. Our preliminary experimental evaluation on both real and synthetic database traces shows that our methods work well in practical situations.

    View the Full Publication
  • 01/01/2008Application of Automatic Parallelization to Modern Challenges of Scientific Computing IndustriesBrian Armstrong, Rudolf Eigenmann

    Characteristics of full applications found in scientific computing industries today lead to challenges that are not addressed by state-of-the-art approaches to automatic parallelization.These characteristics are not present in CPU kernel codes nor linear algebra libraries, requiring a fresh look at how to make automatic parallelization apply to today's computational industries using full applications. The challenges to automatic parallelization result from software engineering patterns that implement multifunctionality, reusable execution frameworks, data structures shared across abstract programming interfaces, a multilingual code base for a single application, and the observation that full applications demand more from compile-time analysis than CPU kernel codes do. Each of these challenges has a detrimental impact on compile-time analysis required for automatic parallelization. Then, focusing on a set of target loops that are parallelizable by hand and that result in speedups on par with the distributed parallel version of the full applications, we determine the prevalence of a number of issues that hinder automatic parallelization. These issues point to enabling techniques that are missing from the state-of-the-art.In order for automatic parallelization to become utilizedin today's scientific computing industries, the challenges described in this paper must be addressed.

    View the Full Publication
  • 01/01/2008Application of Automatic Parallelization to Modern Challenges of Scientific Computing IndustriesSeyong Lee, Rudolf Eigenmann

    Sparse matrix-vector (SpMV) multiplication is a widely used kernel in scientific applications. In these applications, the SpMV multiplication is usually deeply nested within multiple loops and thus executed a large number of times. We have observed that there can be significant performance variability, due to irregular memory access patterns. Static performance optimizations are difficult because the patterns may be known only at runtime. In this paper, we propose adaptive runtime tuning mechanisms to improve the parallel performance on distributed memory systems. Our adaptive iteration-to-process mapping mechanism balances computational load at runtime with negligible overhead (1% on average), and our runtime communication selection algorithm searches for the best communication method for a given data distribution and mapping. Actual runs on 26 real matrices show that our runtime tuning system reduces execution time up to 68.8% (30.9% on average) over a base block distributed parallel algorithm on distributed systems with 32 nodes.

    View the Full Publication
  • 01/01/2008Optimizing irregular shared-memory applications for clustersSeung-Jai Min, Rudolf Eigenmann

    Irregular applications pose challenges in optimizing communication, due to the difficulty of analyzing irregular data accesses accurately and efficiently. This challenge is especially big when translating irregular shared-memory applications to message passing form for clusters. The lack of effective irregular data analysis in the translation system results in unnecessary or redundant communication, which limits application scalability. In this paper, we present a Lean Distributed Shared Memory (LDSM) system, which features a fast and accurate irregular data access (IDA) analysis. The analysis uses a region-based diff method and makes use of a runtime library that is optimized for irregular applications. We describe three optimizations that improve the LDSM system performance. A parallel array reduction transformation reduces overheads in the analysis. A packed communication optimization and a differential communication optimization effectively eliminate unnecessary and redundant messages. We evaluate the performance of the optimized LDSM system on a set of representative irregular benchmarks. The optimized LDSM executes irregular applications on average 45% faster than the hand-tuned MPI applications.

    View the Full Publication
  • 01/01/2008Adaptive tuning in a dynamically changing resource environmentSeyong Lee, Rudolf Eigenmann

    We present preliminary results of a project to create a tuning system that adaptively optimizes programs to the underlying execution platform. We will show initial results from two related efforts, (i) Our tuning system can efficiently select the best combination of compiler options, when translating programs to a target system, (ii) By tuning irregular applications that operate on sparse matrices, our system is able to achieve substantial performance improvements on cluster platforms. This project is part of a larger effort that aims at creating a global information sharing system, where resources, such as software applications, computer platforms, and information can be shared, discovered, and adapted to local needs.

    View the Full Publication
  • 01/01/2008Efficient content search in iShare, a P2P based Internet-sharing systemSeyong Lee, Xiaojuan Ren, Rudolf Eigenmann

    This paper presents an efficient content search system, which is applied to iShare, a distributed peer-to-peer (P2P) Internet sharing system. iShare facilitates the sharing of diverse resources located in different administrative domains over the Internet. For efficient resource management, iShare organizes resources into a hierarchical name space, which is distributed over the underlying structured P2P network. However, iShare's search capability has a fundamental limit inherited from the underlying structured P2P system's search capability. Most existing structured P2P systems do not support content searches. There exists some research that provides content search functionality, but the approaches do not scale well and incur substantial overheads on data updates. To address these issues, we propose an efficient hierarchical summary system, which enables an efficient content search and semantic ranking capability over traditional structured P2P systems. Our system uses a hierarchical name space to implement a summary hierarchy on top of existing structured P2P overlay networks, and uses a Bloom Filter as a summary structure to reduce space and maintenance overhead. We implemented the proposed system in iShare, and the results show that our search system finds all relevant results regardless of summary scale and the search latency increases very slowly as the network grows.

    View the Full Publication
  • 01/01/2008Incorporation of OpenMP Memory Consistency into Conventional Dataflow AnalysisAyon Basumallik, Rudolf Eigenmann

    Current OpenMP compilers are often limited in their analysis and optimization of OpenMP programs by the challenge of incorporating OpenMP memory consistency semantics into conventional data flow algorithms. An important reason for this is that data flow analysis within current compilers traverse the program’s control-flow graph (CFG) and the CFG does not accurately model the memory consistency specifications of OpenMP. In this paper, we present techniques to incorporate memory consistency semantics into conventional dataflow analysis by transforming the program’s CFG into an OpenMP Producer-Consumer Flow Graph (PCFG), where a path exists from writes to reads of shared data if and only if a dependence is implied by the OpenMP memory consistency model. We present algorithms for these transformations, prove the correctness of these algorithms and discuss a case where this transformation is used.

    View the Full Publication
  • 11/01/2007Collaborative Computing: Networking, Applications and Worksharing, 2007. CollaborateCom 2007. International Conference onElisa Bertino

    The following topics are dealt with: secure sharing and delegation; access control and trust management; sensor networks; privacy; collaborative educations; SOA (service-oriented architecture); collaborative modeling; distributed processing; ad-hoc networks.

    View the Full Publication
  • 11/01/2007A structure preserving approach for securing XML documentsMohamed Nabeel, Elisa Bertino

    With the widespread adoption of XML as the message format to disseminate content over distributed systems including Web Services and Publish-Subscribe systems, different methods have been proposed for securing messages. We focus on a subset of such systems where incremental updates are disseminated. The goal of this paper is to develop an approach for disseminating only the updated or accessible portions of XML content while assuring confidentiality and integrity at message level. While sending only the updates greatly reduces the bandwidth requirements, it introduces the challenge of assuring security efficiently for partial messages disseminated to intermediaries and clients. We propose a novel localized encoding scheme based on conventional cryptographic functions to enforce security for confidentiality and content integrity at the granularity of XML node level. We also address structural integrity with respect to the complete XML document to which clients have access. Our solution takes every possible measure to minimize indirect information leakage by making the rest of the structure of XML documents to which intermediaries and clients do not have access oblivious. The experimental results show that our scheme is superior to conventional techniques of securing XML documents when the percentage of update with respect to original documents is low.

    View the Full Publication
  • 11/01/2007A structure preserving approach for securing XML documentsMohamed Nabeel, Elisa Bertino

    With the widespread adoption of XML as the message format to disseminate content over distributed systems including Web Services and Publish-Subscribe systems, different methods have been proposed for securing messages. We focus on a subset of such systems where incremental updates are disseminated. The goal of this paper is to develop an approach for disseminating only the updated or accessible portions of XML content while assuring confidentiality and integrity at message level. While sending only the updates greatly reduces the bandwidth requirements, it introduces the challenge of assuring security efficiently for partial messages disseminated to intermediaries and clients. We propose a novel localized encoding scheme based on conventional cryptographic functions to enforce security for confidentiality and content integrity at the granularity of XML node level. We also address structural integrity with respect to the complete XML document to which clients have access. Our solution takes every possible measure to minimize indirect information leakage by making the rest of the structure of XML documents to which intermediaries and clients do not have access oblivious. The experimental results show that our scheme is superior to conventional techniques of securing XML documents when the percentage of update with respect to original documents is low.

    View the Full Publication
  • 11/01/2007A system for securing push-based distribution of XML documentsElisa Bertino, Elena Ferrari, Federica Paci, Loredana Provenza

    Push-based systems for distributing information through Internet are today becoming more and more popular and widely used. The widespread use of such systems raises non trivial security concerns. In particular, confidentiality, integrity and authenticity of the distributed data must be ensured. To cope with such issues, we describe here a system for securing push distribution of XML documents, which adopts digital signature and encryption techniques to ensure the above mentioned properties and allows the specification of both signature and access control policies. We also describe the implementation of the proposed system and present an extensive performance evaluation of its main components.

    View the Full Publication
  • 11/01/2007Web services discovery in secure collaboration environmentsMohamed Shehab, Kamal Bhattacharya, Arif Ghafoor

    Multidomain application environments where distributed domains interoperate with each other is a reality in Web-services-based infrastructures. Collaboration enables domains to effectively share resources; however, it introduces several security and privacy challenges. In this article, we use the current web service standards such as SOAP and UDDI to enable secure interoperability in a service-oriented mediator-free environment. We propose a multihop SOAP messaging protocol that enables domains to discover secure access paths to access roles in different domains. Then we propose a path authentication mechanism based on the encapsulation of SOAP messages and the SOAP-DISG standard. Furthermore, we provide a service discovery protocol that enables domains to discover service descriptions stored in private UDDI registries.

    View the Full Publication
  • 10/01/2007Replacing Lost or Stolen E-PassportsJianming Yong, Elisa Bertino

    The launch of e-passports raises concerns about how travellers can replace them if they're lost or stolen.

    View the Full Publication
  • 10/01/2007Quantitative Analysis of Inter-object Spatial Relationships in Biological ImagesWamiq Manzoor Ahmed, Magdalena Jonczyk, Ali Shamsaie, Arif Ghafoor, J. Paul Robinson

    Study of spatial relations between biological objects is crucial for understanding biological processes. Monitoring drug or particle delivery inside cells and studying the dynamics of subcellular proteins are some of the examples. Biological applications have varying demands in terms of speed and accuracy. While accuracy may be the most important factor for small-scale biology, speed is also a concern for high-content/high-throughput screening applications. In this paper we present a variety of algorithms for inter-object spatial relations in two-and three-dimensional space. These algorithms provide trade-off between speed and accuracy, depending on the requirements of the application. Results for speed and accuracy are reported for real as well as synthetic data sets.

    View the Full Publication
  • 10/01/2007Knowledge Modeling for High Content Screening of Multimedia Biological DataArif Ghafoor, J. Paul Robinson

    High-content and high-throughput screening (HCS/HTS) technologies provide powerful imaging tools for analyses of biological processes. These technologies combine sophisticated optics with automation techniques for imaging large populations of cells under different experimental perturbations and produce enormous amount of imaging data, including 2D images, 3D confocal data sets, time-lapse video sequences, and multispectral images. Manual analysis of such data is extremely time consuming, and intelligent image interpretation tools have only recently started to emerge. There is a direct need for powerful automated image understanding and spatio-temporal knowledge extraction techniques for gaining useful semantic information in biological domain consisting of multimodality multimedia data. In this tutorial we highlight key multimedia processing challenges in this domain and present a knowledge extraction and representation framework that is currently underway at Purdue University's Cytometery Laboratories and Distributed Multimedia System Laboratory. The proposed framework is being implemented using XML in order to allow extensibility and standardization.

    View the Full Publication
  • 09/29/2007Query processing of multi-way stream window joinsMoustafa Hammad, Walid G. Aref, Ahmed Elmagarmid

    This paper introduces a class of join algorithms, termed W-join, for joining multiple infinite data streams. W-join addresses the infinite nature of the data streams by joining stream data items that lie within a sliding window and that match a certain join condition. In addition to its general applicability in stream query processing, W-join can be used to track the motion of a moving object or detect the propagation of clouds of hazardous material or pollution spills over time in a sensor network environment. We describe two new algorithms for W-join and address variations and local/global optimizations related to specifying the nature of the window constraints to fulfill the posed queries. The performance of the proposed algorithms is studied experimentally in a prototype stream database system, using synthetic data streams and real time-series data. Tradeoffs of the proposed algorithms and their advantages and disadvantages are highlighted, given variations in the aggregate arrival rates of the input data streams and the desired response times per query.

    View the Full Publication
  • 09/01/2007FSM-based model for spatio-temporal event recognition for HCSWamiq Ahmed, Jia Liu, Dominik Lenz, Arif Ghafoor, J. Paul Robinson

    Extraction of quantitative information about spatio-temporal events happening in cells is the key to understanding biological processes. In this paper we present a finite state machine (FSM)-based model for specification and identification of spatio-temporal events at the single-cell level. Cells are modeled as objects with specific attributes such as color, size, shape, etc., and events are modeled in terms of the specific values of attributes of participating objects along with the spatial relationships between these objects. Results for a time-lapse apoptosis screen are presented where the extra information provided by per cell-based analysis is used to compensate for experimental artifacts. The model is general and is applicable to other cell-based studies also.

    View the Full Publication
  • 09/01/2007Engineering a Policy-Based System for Federated Healthcare DatabasesRafae Bhatti, Arjmand Samuel, Mohamed Eltabakh, Haseeb Amjad, Arif Ghafoor

    Policy-based management for federated healthcare systems have recently gained increasing attention due to strict privacy and disclosure rules. While the work on privacy languages and enforcement mechanisms, such as Hippocratic databases, has advanced our understanding of designing privacy-preserving policies for healthcare databases, the need to integrate these policies in practical healthcare framework is becoming acute. Additionally, while most work in this area has been organization-oriented, dealing with exchange of information between healthcare organizations (such as referrals), the requirements for the emerging area of personal healthcare information management have so far not been adequately addressed. These shortcomings arise from the lack of a sophisticated policy specification language and enforcement architecture that can capture the requirement for (i) integration of privacy and disclosure policies with well-known healthcare standards used in the industry in order to specify the precise requirements of a practical healthcare system, and (ii) provision of ubiquitous healthcare services to patients using the same infrastructure that enables federated healthcare management for organizations. In this paper, we have designed a policy-based system to mitigate these concerns. One, we have designed our disclosure and privacy policies using a requirements specification based on a set of use cases for the Clinical Document Architecture (CDA) standard proposed by the community. Two, we present a context-aware policy specification language which allows encoding of CDA-based requirements use-cases into privacy and disclosure policy rules. We have shown that our policy specification language is effective in terms of handling a variety of expressive constraints on CDA-encoded document contents. Our language enables specification of privacy-aware access control for federated healthcare information across organizational boundaries, while the use of contextual constraints allows the incorporation of user and environment context in the access control mechanism for personal healthcare information management. Moreover, the declarative syntax of the policy rules makes the policy adaptable to changes in privacy regulations or patient preferences. We also present an enforcement architecture for the federated healthcare framework proposed in this paper.

    View the Full Publication
  • 08/01/2007Modeling and language support for the management of pattern-basesPanos Vassiliadis, Spiros Skiadopoulos, Elisa Bertino, Barbara Catania, Anna Maddalena, Stefano Rizzi

    Information overloading is today a serious concern that may hinder the potential of modern web-based information systems. A promising approach to deal with this problem is represented by knowledge extraction methods able to produce artifacts (also called patterns) that concisely represent data. Patterns are usually quite heterogeneous and voluminous. So far, little emphasis has been posed on developing an overall integrated environment for uniformly representing and querying different types of patterns. In this paper we consider the larger problem of modeling, storing, and querying patterns, in a database-like setting and use a Pattern-Base Management System (PBMS) for this purpose. Specifically, (a) we formally define the logical foundations for the global setting of pattern management through a model that covers data, patterns, and their intermediate mappings; (b) we present a formalism for pattern specification along with safety restrictions; and (c) we introduce predicates for comparing patterns and query operators

    View the Full Publication
  • 07/01/2007A Policy-Based Authorization Framework for Web Services: Integrating XGTRBAC and WS-PolicyRafae Bhatti, Daniel Sanz, Elisa Bertino, Arif Ghafoor

    Authorization and access control in Web services is complicated by the unique requirements of the dynamic Web services paradigm. Current authentication mechanisms for Web services do not differentiate between users in terms of fine-grained access privileges. This results in an all-or-nothing access which is not flexible enough for modern day business processes using Web services to execute. In this paper, we present a policy-based authorization framework to address this requirement. We have designed a profile of the well-known WS-Policy specification tailored to meet the access control requirements in Web services by integrating WS-Policy with an access control policy specification language, X-GTRBAC. The design of the profile is aimed at bridging the gap between available policy standards for Web services and existing policy specification languages for access control. The profile supports the WS-Policy Attachment specification, which allows separate policies to be associated with multiple components of a Web service description, and one of our key contributions is the design of an algorithm to compute the effective policy for the Web service given the multiple policy attachments. To allow Web service applications to use our solution, we have adopted a component-based design approach based on well-known UML notations. We have also prototyped our architecture, and implemented it as a loosely coupled Web service providing healthcare information services to physicians subject to applicable authorization policies.

    View the Full Publication
  • 07/01/2007User Tasks and Access Control overWeb ServicesJacques Thomas, Federica Paci, Elisa Bertino, Patrick Eugster

    Web services are a successful technology for enterprise information management, where they are used to expose legacy applications on the corporate intranet or in businessto- business scenarios. The technologies used to expose applications as web services have matured, stabilized, and are defined as W3C standards. Now, the technology used to build applications based on web services, a process known as orchestration, is also maturing around the Web Services Business Process Execution Language (WS-BPEL). WS-BPEL falls short on one feature though: as it is focused on orchestration of fully automatic web-services, WSBPEL does not provide means for specifying human interactions, even less their access-control requirements. Human interactions are nonetheless needed for flexible business processes. This lacking feature of WS-BPEL has been highlighted in a white paper issued jointly by IBM and SAP, which "describes scenarios where users are involved in business processes, and defines appropriate extensions to WS-BPEL to address these." These extensions, called BPEL4People, are well explained, but their implementation isn't. In this paper, we propose a language for specifying these extensions, as well as an architecture to support them. The salient advantage of our architecture is that it allows for the reuse of existing BPEL engines. In addition, our language allows for specifying these extensions within the main BPEL script, hence preserving a global view of the process. We illustrate our extensions by revisiting the classic loan approval BPEL example.

    View the Full Publication
  • 07/01/2007PP-trust-X: A system for privacy preserving trust negotiationsAnna Squicciarini, Elisa Bertino, Elena Ferrari, Federica Paci, Bhavani Thuraisingham

    Trust negotiation is a promising approach for establishing trust in open systems, in which sensitive interactions may often occur between entities with no prior knowledge of each other. Although, to date several trust negotiation systems have been proposed, none of them fully address the problem of privacy preservation. Today, privacy is one of the major concerns of users when exchanging information through the Web and thus we believe that trust negotiation systems must effectively address privacy issues in order to be widely applicable. For these reasons, in this paper, we investigate privacy in the context of trust negotiations. We propose a set of privacy-preserving features for inclusion in any trust negotiation system, such as the support for the P3P standard, as well as a number of innovative features, such as a novel format for encoding digital credentials specifically designed for preserving privacy. Further, we present a variety of interoperable strategies to carry on the negotiation with the aim of improving both privacy and efficiency.

    View the Full Publication
  • 07/01/2007A roadmap for comprehensive online privacy policy managementAnnie Anton, Elisa Bertino, Ninghui Li, Ting Yu

    A framework supporting the privacy policy life cycle helps guide the kind of research to consider before sound privacy answers may be realized.

    View the Full Publication
  • 07/01/2007A Policy-Based Authorization Framework for Web Services: Integrating XGTRBAC and WS-PolicyRafae Bhatti, Daniel Sanz, Elisa Bertino, Arif Ghafoor

    Authorization and access control in Web services is complicated by the unique requirements of the dynamic Web services paradigm. Current authentication mechanisms for Web services do not differentiate between users in terms of fine-grained access privileges. This results in an all-or-nothing access which is not flexible enough for modern day business processes using Web services to execute. In this paper, we present a policy-based authorization framework to address this requirement. We have designed a profile of the well-known WS-Policy specification tailored to meet the access control requirements in Web services by integrating WS-Policy with an access control policy specification language, X-GTRBAC. The design of the profile is aimed at bridging the gap between available policy standards for Web services and existing policy specification languages for access control. The profile supports the WS-Policy Attachment specification, which allows separate policies to be associated with multiple components of a Web service description, and one of our key contributions is the design of an algorithm to compute the effective policy for the Web service given the multiple policy attachments. To allow Web service applications to use our solution, we have adopted a component-based design approach based on well-known UML notations. We have also prototyped our architecture, and implemented it as a loosely coupled Web service providing healthcare information services to physicians subject to applicable authorization policies.

    View the Full Publication
  • 06/01/2007Supporting Robust and Secure Interactions in Open Domains through Recovery of Trust NegotiationsAnna Squicciarini, Alberto Trombetta, Elisa Bertino

    Trust negotiation supports authentication and access control across multiple security domains by allowing parties to use non-forgeable digital credentials to establish trust. By their nature trust negotiation systems are used in environments that are not always reliable. In particular, it is important not only to protect negotiations against malicious attacks, but also against failures and crashes of the parties or of the communication means. To address the problem of failures and crashes, we propose an efficient and secure recovery mechanism. The mechanism includes two recovery protocols, one for each of the two main negotiation phases. In fact, because of the requirements that both services and credentials have to be protected on the basis of the associated disclosure policies, most approaches distinguish between a phase of disclosure policy evaluation from a phase devoted to actual credentials exchange. We prove that the protocols, besides being efficient, are secure with respect to integrity, and confidentiality and are idempotent. To the best of our knowledge, this is the first effort for achieving robustness and fault tolerance of trust negotiation systems.

    View the Full Publication
  • 06/01/2007XACML Function AnnotationsPrathima Rao, Dan Lin, Elisa Bertino

    XACML is being increasingly adopted in large enterprise systems for specifying access control policies. However, the efficient analysis and integration of multiple policies in such large distributed systems still remains a difficult task. In this paper, we propose an annotation technique which is a simple extension to XACML, and may greatly benefit the policy analysis process. We also discuss an important consistency problem during XACML policy translation and point out a few possible research directions.

    View the Full Publication
  • 06/01/2007Quality-of-Service Routing in Heterogeneous Networks with Optimal Buffer and Bandwidth AllocationWaseem Sheikh, Arif Ghafoor

    We present an interdomain routing protocol for heterogeneous networks employing different queuing service disciplines. Our routing protocol finds optimal interdomain paths with maximum reliability while satisfying the end-to-end jitter and bandwidth constraints in networks employing heterogeneous queuing service disciplines. The quality-of-service (QoS) metrics are represented as functions of link bandwidth, node buffers and the queuing service disciplines employed in the routers along the path. Our scheme allows smart tuning of buffer-space and bandwidth during the routing process to adjust the QoS of the interdomain path. We formulate and solve the bandwidth and buffer allocation problem for a path over heterogeneous networks consisting of different queuing services disciplines such as generalized processor sharing (GPS), packet by packet generalized processor sharing (PGPS) and self-clocked fair queuing (SCFQ).

    View the Full Publication
  • 04/20/2007SOLE: scalable on-line execution of continuous queries on spatio-temporal data streamsMohamed Mokbel, Walid G. Aref

    This paper presents the scalable on-line execution (SOLE) algorithm for continuous and on-line evaluation of concurrent continuous spatio-temporal queries over data streams. Incoming spatio-temporal data streams are processed in-memory against a set of outstanding continuous queries. The SOLE algorithm utilizes the scarce memory resource efficiently by keeping track of only the significant objects. In-memory stored objects are expired (i.e., dropped) from memory once they become insignificant. SOLE is a scalable algorithm where all the continuous outstanding queries share the same buffer pool. In addition, SOLE is presented as a spatio-temporal join between two input streams, a stream of spatio-temporal objects and a stream of spatio-temporal queries. To cope with intervals of high arrival rates of objects and/or queries, SOLE utilizes a load-shedding approach where some of the stored objects are dropped from memory. SOLE is implemented as a pipelined query operator that can be combined with traditional query operators in a query execution plan to support a wide variety of continuous queries. Performance experiments based on a real implementation of SOLE inside a prototype of a data stream management system show the scalability and efficiency of SOLE in highly dynamic environments.

    View the Full Publication
  • 04/01/2007Profiling Database Application to Detect SQL Injection AttacksElisa Bertino, Ashish Kamra, James Early

    Countering threats to an organization's internal databases from database applications is an important area of research. In this paper, we propose a novel framework based on anomaly detection techniques, to detect malicious behaviour of database application programs. Specifically, we create a fingerprint of an application program based on SQL queries submitted by it to a database. We then use association rule mining techniques on this fingerprint to extract useful rules. These rules succinctly represent the normal behaviour of the database application. We then apply an anomaly detection algorithm to detect queries that do not conform to these rules. We further demonstrate how this model can be used to detect SQL Injection attacks on databases. We show the validity and usefulness of our approach on synthetically generated datasets and SQL Injected queries. Experimental results show that our techniques are effective in addressing various types of SQL Injection threat scenarios

    View the Full Publication
  • 03/01/2007Open Internet-based Sharing for Desktop Grids in iShareXiaojuan Ren, Ayon Basumallik, Zhelong Pan, Rudolf Eigenmann

    This paper presents iShare, a distributed peer-to-peer Internet-sharing system, that facilitates the sharing of diverse resources located in different administrative domains over the Internet. iShare addresses the challenges of resource management in desktop grids, and integrates these resources with production grids. In this paper, we present a brief overview of the iShare system and describe how iShare leverages existing standards to provide novel solutions to the problems of resource dissemination, resource allocation and trust in desktop grids. We also discuss how iShare integrates production grid systems, such as the Teragrid, with desktop resources and compare the iShare approach with Web-based user portals for production grids. To quantitatively evaluate our techniques, we measured the efficiency of resource allocation in iShare and the overheads associated with establishing trust and providing the iShare user interface for production grids. The evaluation results demonstrate that iShare enables open Internet sharing with efficiency, reliability, and security.

    View the Full Publication
  • 03/01/2007Programming Distributed Memory Sytems Using OpenMPAyon Basumallik, Seung-Jai Min, Rudolf Eigenmann

    OpenMP has emerged as an important model and language extension for shared-memory parallel programming. On shared-memory platforms, OpenMP offers an intuitive, incremental approach to parallel programming. In this paper, we present techniques that extend the ease of shared-memory parallel programming in OpenMP to distributed-memory platforms as well. First, we describe a combined compile-time/runtime system that uses an underlying software distributed shared memory system and exploits repetitive data access behavior in both regular and irregular program sections. We present a compiler algorithm to detect such repetitive data references and an API to an underlying software distributed shared memory system to orchestrate the learning and proactive reuse of communication patterns. Second, we introduce a direct translation of standard OpenMP into MPI message-passing programs for execution on distributed memory systems. We present key concepts and describe techniques to analyze and efficiently handle both regular and irregular accesses to shared data. Finally, we evaluate the performance achieved by our approaches on representative OpenMP applications.

    View the Full Publication
  • 02/01/2007GEO-RBAC: A spatially aware RBACMaria Luisa Damiani, Elisa Bertino, Barbara Catania, Paolo Perlasca

    Securing access to data in location-based services and mobile applications requires the definition of spatially aware access-control systems. Even if some approaches have already been proposed either in the context of geographic database systems or context-aware applications, a comprehensive framework, general and flexible enough to deal with spatial aspects in real mobile applications, is still missing. In this paper, we make one step toward this direction and present GEO-RBAC, an extension of the RBAC model enhanced with spatial-and location-based information. In GEORBAC, spatial entities are used to model objects, user positions, and geographically bounded roles. Roles are activated based on the position of the user. Besides a physical position, obtained from a given mobile terminal or a cellular phone, users are also assigned a logical and device-independent position, representing the feature (the road, the town, the region) in which they are located. To enhance flexibility and reusability, we also introduce the concept of role schema, specifying the name of the role, as well as the type of the role spatial boundary and the granularity of the logical position. We then extend GEO-RBAC to support hierarchies, modeling permission, user, and activation inheritance, and separation of duty constraints. The proposed classes of constraints extend the conventional ones to deal with different granularities (schema/instance level) and spatial information. We conclude the paper with an analysis of several properties concerning the resulting model.

    View the Full Publication
  • 02/01/2007An integrated approach to federated identity and privilege management in open systemsRafae Bhatti, Elisa Bertino, Arif Ghafoor

    Online partnerships depend on federations of not only user identities but also of user entitlements across organizational boundaries.

    View the Full Publication
  • 01/01/2007Efficient k-Anonymization Using Clustering TechniquesJi-Won Byun, Ashish Kamra, Elisa Bertino, Ninghui Li

    k-anonymization techniques have been the focus of intense research in the last few years. An important requirement for such techniques is to ensure anonymization of data while at the same time minimizing the information loss resulting from data modifications. In this paper we propose an approach that uses the idea of clustering to minimize information loss and thus ensure good data quality. The key observation here is that data records that are naturally similar to each other should be part of the same equivalence class. We thus formulate a specific clustering problem, referred to as k-member clustering problem. We prove that this problem is NP-hard and present a greedy heuristic, the complexity of which is in O(n 2). As part of our approach we develop a suitable metric to estimate the information loss introduced by generalizations, which works for both numeric and categorical data.

    View the Full Publication
  • 01/01/2007Data Management in RFID ApplicationsDan Lin, Hicham Elmongui, Elisa Bertino, Beng Chin Ooi

    Nowadays, RFID applications have attracted a great deal of interest due to their increasing adoptions in supply chain management, logistics and security. They have posed many new challenges to existing underlying database technologies, such as the requirements of supporting big volume data, preserving data transition path and handling new types of queries. In this paper, we propose an efficient method to manage RFID data. We explore and take advantage of the containment relationships in the relational tables in order to support special queries in the RFID applications. The experimental evaluation conducted on an existing RDBMS demonstrates the efficiency of our method.

    View the Full Publication
  • 01/01/2007Password policy simulation and analysisRichard Shay, Abhilasha Bhargav-Spantzel, Elisa Bertino

    Passwords are an ubiquitous and critical component of many security systems. As the information and access guarded by passwords become more necessary, we become ever more dependent upon the security passwords provide. The creation and management of passwords is crucial, and for this we must develop and deploy password policies. This paper focuses on defining and modeling password policies for the entire password policy lifecycle. The paper first discusses a language for specifying password policies. Then, a simulation model is presented with a comprehensive set of variables and the algorithm for simulating a password policy and its impact. Finally, the paper presents several simulation results using the password policy simulation tool.

    View the Full Publication
  • 01/01/2007Receipt management- transaction history based trust establishmentAbhilasha Bhargav-Spantzel, Jungha Woo, Elisa Bertino

    In a history-based trust-management system, users and service providers use information about past transactions to make trust-based decisions concerningcurrent transactions. One category of such systems is represented by the reputation systems. However, despite the growing body of experience in building reputation systems, there are several limitations on how they are typically implemented. They often rely on scores that are evaluated by service providers and are often not reliable or well understood. We believe that reputation hasto be based on objective and reliable information. In such context, transaction histories play an important role. In this paper, we present the VeryIDX systemthat implements an electronic receipt infrastructure and supports protocols to build and manage online transaction history of users. The receipt protocols are shown to have several essential security and privacy properties. We present a basic yet reasonably expressive language which provides service providers with a new way to establish trust based on users' transaction history. We alsodescribe the architecture and prototype implementation of VeryIDX, based on several important design considerations of a real-world e-commerce system infrastructure.

    View the Full Publication
  • 01/01/2007Pipelined spatial join processing for quadtree-based indexesWalid G. Aref

    Spatial join is an important yet costly operation in spatial databases. In order to speed up the execution of a spatial join, the input tables are often indexed based on their spatial attributes. The quadtree index structure is a well-known index for organizing spatial database objects. It has been implemented in several database management systems, e.g., in Oracle Spatial and in PostgreSQL (via SP-GiST). Queries typically involve multiple pipelined spatial join operators that fit together in a query evaluation plan. In order to extend the applicability of these spatial joins, they are optimized so that upon receiving sorted input, they produce sorted output for the spatial join operators in the upperlevels of the query evaluation pipeline. This paper investigates the use of quadtree-based spatial join algorithms and how they can be adapted to answer queries that involve multiple pipelined spatial joins in a query evaluation plan. The paper investigates several adaptations to pipelined spatial join algorithms and their performance for the cases when both input tables are indexed, when only one of the tables is indexed while the second table is sorted, and when both tables are sorted but are not indexed.

    View the Full Publication
  • 01/01/2007The New Casper: A Privacy-Aware Location-Based Database ServerMohamed Mokbel, Chin-Yin Chow, Walid G. Aref

    This demo presents Casper; a framework in which users entertain anonymous location-based services. Casper consists of two main components; the location anonymizer that blurs the users' exact location into cloaked spatial regions and the privacy-aware query processor that is responsible on providing location-based services based on the cloaked spatial regions. While the location anonymizer is implemented as a stand alone application, the privacy-aware query processor is embedded into PLACE; a research prototype for location-based database servers.

    View the Full Publication
  • 01/01/2007Realizing Privacy-Preserving Features in Hippocratic DatabasesYasin Silva, Walid G. Aref

    Presenting privacy has become a crucial requirement for operating a business that manages personal data. Hippocratic databases have been proposed to answer this requirement through a database design that includes responsibility for the privacy of data as a founding tenet. We identify, study, and implement several privacy-preserving features that extend the previous work on Limiting Disclosure in Hippocratic databases. These features include the support of multiple policy versions, retention time, generalization hierarchies, and multiple SQL operations. The proposed features facilitate in making Hippocratic databases one step closer to fitting real-world scenarios. We present the design and implementation guidelines of each of the proposed features. The evaluation of the effect in performance shows that the cost of these extensions is small and scales well to large databases.

    View the Full Publication
  • 01/01/2007Location-Aware Query Processing and OptimizationMohamed Mokbel, Walid G. Aref

    The wide spread use of cellular phones, handheld devices, and GPS-like technology enables location-aware environments where virtually all objects are aware of their locations. Such environments call for new query processing techniques that deal with the continuous movement of both spatio-temporal objects and queries. The goal of this tutorial is to: (1) Give an in-depth view on supporting location-aware queries as an increasingly interesting area of research, (2) Present the state-of-the-art techniques for efficient handling of location-aware snapshot/continuous queries, and (3) Motivate the need for integrating location-awareness as a new query processing and optimization dimension, and (4) Raise several research challenges that need to be addressed towards a true support for location-aware queries in database management systems.

    View the Full Publication
  • 01/01/2007Place: A Distributed Spatio-Temporal Data Stream Management System for Moving ObjectsXiaopeng Xiong, Hicham Elmongui, Xiaoyong Chai, Walid G. Aref

    In this paper, we introduce PLACE*, a distributed spatio-temporal data stream management system for moving objects. PLACE* supports continuous spatio-temporal queries that hop among a network of regional servers. To minimize the execution cost, a new Query-Track- Participate (QTP) query processing model is proposed inside PLACE*. In the QTP model, a query is continuously answered by a querying server, a tracking server, and a set of participating servers. In this paper, we focus on query plan generation, execution and update algorithms for continuous range queries in PLACE* using QTP. An extensive experimental study demonstrates the effectiveness of the proposed algorithms in PLACE*.

    View the Full Publication
  • 01/01/2007Phenomenon-Aware Stream Query ProcessingMohamed Ali, Mohamed Mokbel, Walid G. Aref

    Spatio-temporal data streams that are generated from mobile stream sources (e.g., mobile sensors) experience similar environmental conditions that result in distinct phenomena. Several research efforts are dedicated to detect and track various phenomena inside a data stream management system (DSMS). In this paper, we use the detected phenomena to reduce the demand on the DSMS resources. The main idea is to let the query processor observe the input data streams at the phenomena level. Then, each incoming continuous query is directed only to those phenomena that participate in the query answer. Two levels of indexing are employed, a phenomenon index and a query index. The phenomenon index provides a fine resolution view of the input streams that participate in a particular phenomenon. The query index utilizes the phenomenon index to maintain a query deployment map in which each input stream is aware of the set of continuous queries that the stream contributes to their answers. Both indices are updated dynamically in response to the evolving nature of phenomena and to the mobility of the stream sources. Experimental results show the efficiency of this approach with respect to the accuracy of the query result and the resource utilization of the DSMS.

    View the Full Publication
  • 01/01/2007Duplicate Elimination in Space-partitioning Tree IndexesMohamed Eltabakh, Mourad Ouzzani, Walid G. Aref

    Space-partitioning trees, like the disk-based trie, quadtree, kd-tree and their variants, are a family of access methods that index multi-dimensional objects. In the case of indexing non-zero extent objects, e.g., line segments and rectangles, space-partitioning trees may replicate objects over multiple space partitions, e.g., PMR quadtree, expanded MX-CIF quadtree, and extended kd-tree. As a result, the answer to a query over these indexes may include duplicates that need to be eliminated, i.e., the same object may be reported more than once. In this paper, we propose generic duplicate elimination techniques for the class of space-partitioning trees in the context of SP-GiST; an extensible indexing framework for realizing space-partitioning trees. The proposed techniques are embedded inside the INDEX-SCAN operator. Therefore, duplicate copies of the same object do not propagate in the query plan, and the elimination process is transparent to the end-users. Two cases for the index structures are considered based on whether or not the objects? coordinates are stored inside the index tree. The theoretical and experimental analysis illustrate that the proposed techniques achieve savings in the storage requirements, I/O operations, and processing time when compared to adding a separate duplicate elimination operator in the query plan.

    View the Full Publication
  • 01/01/2007Efficient query execution on broadcasted index tree structuresSusanne E. Hambrusch, Chuan-Ming Liu, Walid G. Aref, Sunil Prabhakar

    The continuous broadcast of data together with an index structure is an effective way of disseminating data in a wireless mobile environment. The index allows a mobile client to tune in only when relevant data is available on the channel and leads to reduced power consumption for the clients. This paper investigates the execution of queries on broadcasted index trees when query execution corresponds to a partial traversal of the tree. Queries exhibiting this behavior include range queries and nearest neighbor queries. We present two broadcast schedules for index trees and two query algorithms executed by mobile clients. Our solutions simultaneously minimize tuning time and latency and adapt to the client’s available memory. Experimental results using real and synthetic data compare results for a broadcast with node repetition to one without node repetition and they show how a priority-based data management can help reduce tuning time and latency.

    View the Full Publication
  • 01/01/2007Incremental Evaluation of Sliding-Window Queries over Data StreamsThanaa Ghanem, Moustafa Hammad, Mohamed Mokbel, Walid G. Aref, Ahmed Elmagarmid

    Two research efforts have been conducted to realize sliding-window queries in data stream management systems, namely, query reevaluation and incremental evaluation. In the query reevaluation method, two consecutive windows are processed independently of each other. On the other hand, in the incremental evaluation method, the query answer for a window is obtained incrementally from the answer of the preceding window. In this paper, we focus on the incremental evaluation method. Two approaches have been adopted for the incremental evaluation of sliding-window queries, namely, the input-triggered approach and the negative tuples approach. In the input-triggered approach, only the newly inserted tuples flow in the query pipeline and tuple expiration is based on the timestamps of the newly inserted tuples. On the other hand, in the negative tuples approach, tuple expiration is separated from tuple insertion where a tuple flows in the pipeline for every inserted or expired tuple. The negative tuples approach avoids the unpredictable output delays that result from the input-triggered approach. However, negative tuples double the number of tuples through the query pipeline, thus reducing the pipeline bandwidth. Based on a detailed study of the incremental evaluation pipeline, we classify the incremental query operators into two classes according to whether an operator can avoid the processing of negative tuples or not. Based on this classification, we present several optimization techniques over the negative tuples approach that aim to reduce the overhead of processing negative tuples while avoiding the output delay of the query answer. A detailed experimental study, based on a prototype system implementation, shows the performance gains over the input-triggered approach of the negative tuples approach when accompanied with the proposed optimizations.

    View the Full Publication
  • 01/01/2007Space-Partitioning Trees in PostgreSQL: Realization and PerformanceMohamed Eltabakh, Ramy Eltarras, Walid G. Aref

    Many evolving database applications warrant the use of non-traditional indexing mechanisms beyond B+-trees and hash tables. SP-GiST is an extensible indexing framework that broadens the class of supported indexes to include disk-based versions of a wide variety of space-partitioning trees, e.g., disk-based trie variants, quadtree variants, and kd-trees. This paper presents a serious attempt at implementing and realizing SP-GiST-based indexes inside PostgreSQL. Several index types are realized inside PostgreSQL facilitated by rapid SP-GiST instantiations. Challenges, experiences, and performance issues are addressed in the paper. Performance comparisons are conducted from within PostgreSQL to compare update and search performances of SP-GiST-based indexes against the B+-tree and the R-tree for string, point, and line segment data sets. Interesting results that highlight the potential performance gains of SPGiST- based indexes are presented in the paper.

    View the Full Publication
  • 01/01/2007R-trees with Update MemosXiaopeng Xiong, Walid G. Aref

    The problem of frequently updating multi-dimensional indexes arises in many location-dependent applications. While the R-tree and its variants are one of the dominant choices for indexing multi-dimensional objects, the R-tree exhibits inferior performance in the presence of frequent updates. In this paper, we present an R-tree variant, termed the RUM-tree (stands for R-tree with Update Memo) that minimizes the cost of object updates. The RUM-tree processes updates in a memo-based approach that avoids disk accesses for purging old entries during an update process. Therefore, the cost of an update operation in the RUM-tree reduces to the cost of only an insert operation. The removal of old object entries is carried out by a garbage cleaner inside the RUM-tree. In this paper, we present the details of the RUM-tree and study its properties. Theoretical analysis and experimental evaluation demonstrate that the RUMtree outperforms other R-tree variants by up to a factor of eight in scenarios with frequent updates.

    View the Full Publication
  • 01/01/2007Efficient k-Anonymization Using Clustering TechniquesJi-Won Byun, Ashish Kamra, Elisa Bertino, Ninghui Li

    k-anonymization techniques have been the focus of intense research in the last few years. An important requirement for such techniques is to ensure anonymization of data while at the same time minimizing the information loss resulting from data modifications. In this paper we propose an approach that uses the idea of clustering to minimize information loss and thus ensure good data quality. The key observation here is that data records that are naturally similar to each other should be part of the same equivalence class. We thus formulate a specific clustering problem, referred to as k-member clustering problem. We prove that this problem is NP-hard and present a greedy heuristic, the complexity of which is in O(n 2). As part of our approach we develop a suitable metric to estimate the information loss introduced by generalizations, which works for both numeric and categorical data.

    View the Full Publication
  • 01/01/2007Data Management in RFID ApplicationsDan Lin, Hicham Elmongui, Elisa Bertino, Beng Chin Ooi

    Nowadays, RFID applications have attracted a great deal of interest due to their increasing adoptions in supply chain management, logistics and security. They have posed many new challenges to existing underlying database technologies, such as the requirements of supporting big volume data, preserving data transition path and handling new types of queries. In this paper, we propose an efficient method to manage RFID data. We explore and take advantage of the containment relationships in the relational tables in order to support special queries in the RFID applications. The experimental evaluation conducted on an existing RDBMS demonstrates the efficiency of our method.

    View the Full Publication
  • 01/01/2007Password policy simulation and analysisRichard Shay, Abhilasha Bhargav-Spantzel, Elisa Bertino

    Passwords are an ubiquitous and critical component of many security systems. As the information and access guarded by passwords become more necessary, we become ever more dependent upon the security passwords provide. The creation and management of passwords is crucial, and for this we must develop and deploy password policies. This paper focuses on defining and modeling password policies for the entire password policy lifecycle. The paper first discusses a language for specifying password policies. Then, a simulation model is presented with a comprehensive set of variables and the algorithm for simulating a password policy and its impact. Finally, the paper presents several simulation results using the password policy simulation tool.

    View the Full Publication
  • 01/01/2007Receipt management- transaction history based trust establishmentAbhilasha Bhargav-Spantzel, Jungha Woo, Elisa Bertino

    In a history-based trust-management system, users and service providers use information about past transactions to make trust-based decisions concerningcurrent transactions. One category of such systems is represented by the reputation systems. However, despite the growing body of experience in building reputation systems, there are several limitations on how they are typically implemented. They often rely on scores that are evaluated by service providers and are often not reliable or well understood. We believe that reputation hasto be based on objective and reliable information. In such context, transaction histories play an important role. In this paper, we present the VeryIDX systemthat implements an electronic receipt infrastructure and supports protocols to build and manage online transaction history of users. The receipt protocols are shown to have several essential security and privacy properties. We present a basic yet reasonably expressive language which provides service providers with a new way to establish trust based on users' transaction history. We alsodescribe the architecture and prototype implementation of VeryIDX, based on several important design considerations of a real-world e-commerce system infrastructure.

    View the Full Publication
  • 01/01/2007Privacy Requirements in Identity Management SolutionsAbhilasha Bhargav-Spantzel, Anna Squicciarini, Matthew Young, Elisa Bertino

    In this paper we highlight the need for privacy of user data used in digital identity management systems. We investigate the issues from the individual, business, and government perspectives. We provide surveys related to the growing problem of identity theft and the sociological concerns of individuals with respect to the privacy of their identity data. We show the privacy concerns, especially with respect to health and biometric data, where the loss of privacy of that data may have serious consequences. Moreover, we also discuss how privacy concerns change according to the individual’s disposition to provide the data. Voluntary disclosure of personal information is more acceptable to users than if information disclosure is involuntary, like in the case of surveillance. Finally, we highlight the shortcomings of current identity management systems with respect to the current privacy needs and motivate the need of hardened importance of privacy enabling functionalities in such systems

    View the Full Publication
  • 01/01/2007A System for the Specification and Enforcement of Quality-Based Authentication PoliciesAnna Squicciarini, Abhilasha Bhargav-Spantzel, Elisa Bertino, Alexie Czeskis

    This paper develops a language and a reference architecture supporting the management and enforcement of authentication policies. Such language directly supports multi-factor authentication and the high level specification of authentication factors, in terms of conditions against the features of the various authentication mechanisms and modules. In addition the language supports a rich set of constraints; by using these constraints, one can specify for example that a subject must be authenticated by two credentials issued by different authorities. The paper presents a logical definition of the language and its corresponding XML encoding. It also reports an implementation of the proposed authentication system in the context of the FreeBSD Unix operating system (OS). Critical issues in the implementation are discussed and performance results are reported. These results show that the implementation is very efficient.

    View the Full Publication
  • 01/01/2007Managing Risks in RBAC Employed Distributed EnvironmentsEbru Celikel, Murat Kantarcioglu, Bhavani Thuraisingham, Elisa Bertino

    Role Based Access Control (RBAC) has been introduced in an effort to facilitate authorization in database systems. It introduces roles as a new layer in between users and permissions. This not only provides a well maintained access granting mechanism, but also alleviates the burden to manage multiple users. While providing comprehensive access control, current RBAC models and systems do not take into consideration the possible risks that can be incurred with role misuse. In distributed environments a large number of users are a very common case, and a considerable number of them are first time users. This fact magnifies the need to measure risk before and after granting an access. We investigate the means of managing risks in RBAC employed distributed environments and introduce a probability based novel risk model. Based on each role, we use information about user credentials, current user queries, role history log and expected utility to calculate the overall risk. By executing data mining on query logs, our scheme generates normal query clusters. It then assigns different risk levels to individual queries, depending on how far they are from the normal clusters. We employ three types of granularity to represent queries in our architecture. We present experimental results on real data sets and compare the performances of the three granularity levels.

    View the Full Publication
  • 01/01/2007Decentralized authorization and data security in web content deliveryDanfeng Yao, Yunhua Koglin, Elisa Bertino, Roberto Tamassia

    The fast development of web services, or more broadly, service-oriented architectures (SOAs), has prompted more organizations to move contents and applications out to the Web. Softwares on the web allow one to enjoy a variety of services, for example translating texts into other languages and converting a document from one format to another. In this paper, we address the problem of maintaining data integrity and confidentiality in web content delivery when dynamic content modifications are needed. We propose a flexible and scalable model for secure content delivery based on the use of roles and role certificates to manage web intermediaries. The proxies coordinate themselves in order to process and deliver contents, and the integrity of the delivered content is enforced using a decentralized strategy. To achieve this, we utilize a distributed role lookup table and a role-number based routing mechanism. We give an efficient secure protocol, iDeliver, for content processing and delivery, and also describe a method for securely updating role lookup tables. Our solution also applies to the security problem in web-based workflows, for example maintaining the data integrity in automated trading, contract authorization, and supply chain management in large organizations.

    View the Full Publication
  • 01/01/2007An approach to evaluate policy similarityDan Lin, Prathima Rao, Elisa Bertino, Jorge Lobo

    Recent collaborative applications and enterprises very often need to efficiently integrate their access control policies. An important step in policy integration is to analyze the similarity of policies. Existing approaches to policy similarity analysis are mainly based on logical reasoning and boolean function comparison. Such approaches are computationally expensive and do not scale well for large heterogeneous distributed environments (like Grid computing systems). In this paper, we propose a policy similarity measure as a filter phase for policy similarity analysis. This measure provides a lightweight approach to pre-compile a large amount of policies and only return the most similar policies for further evaluation. In the paper we formally define the measure, by taking into account both the case of categorical attributes and numeric attributes. Detailed algorithms are presented for the similarly computation. Results of our case study demonstrates the efficiency and practical value of our approach.

    View the Full Publication
  • 01/01/2007Privacy-aware role based access controlQun Ni, Alberto Trombetta, Elisa Bertino, Jorge Lobo

    Privacy has been acknowledged to be a critical requirement for many business (and non-business) environments. Therefore, the definition of an expressive and easy-to-use privacy related access control model, based on which privacy policies can be specified, is crucial. In this work we introduce a family of models (P-RBAC) that extend the well known RBAC model in order to provide full support for expressing highly complex privacy-related policies, taking into account features like purposes and obligations. We also compare our work with access control and privacy policy frameworks such as P3P, EPAL, and XACML.

    View the Full Publication
  • 01/01/2007A privacy preserving assertion based policy language for federation systemsAnna Squicciarini, Ayca Azgin Hintoqlu, Elisa Bertino, Yucel Saygin

    Identity federation systems enable participating organizations to provide services to qualified individuals and manage their identity attributes at an inter-organizational level. Most importantly, they empower individuals with control over the usage of their attributes within the federation via enforcement of various policies. Among such policies, one of the most important yet immature one is the privacy policy. Existing frameworks proposed for privacy-preserving federations lack the capability to support complex data-usage preferences in the form of obligations, i.e. the privacy related actions that must be performed upon certain actions on a specific piece of information. Moreover, they do not account for the history of events resulting from the interactions among federation entities.

    To address these deficiencies we propose an extension to an existing assertion based policy language. More specifically, we provide a new set of assertions to define the privacy related properties of a federation system. We extend the com-mon definition of privacy preference policies with obligation preferences. Finally, we illustrate how the proposed framework is realized among service providers to ensure proper enforcement of privacy policies and obligations.

    View the Full Publication
  • 01/01/2007Privacy preserving schema and data matchingMonica Scannapieco, Ilya Figotin, Elisa Bertino, Ahmed Elmagarmid

    In many business scenarios, record matching is performed across different data sources with the aim of identifying common information shared among these sources. However such need is often in contrast with privacy requirements concerning the data stored by the sources. In this paper, we propose a protocol for record matching that preserves privacy both at the data level and at the schema level. Specifically, if two sources need to identify their common data, by running the protocol they can compute the matching of their datasets without sharing their data in clear and only sharing the result of the matching. The protocol uses a third party, and maps records into a vector space in order to preserve their privacy. Experimental results show the efficiency of the matching protocol in terms of precision and recall as well as the good computational performance.

    View the Full Publication
  • 01/01/2007Information carrying identity proof treesWiliam Winsborough, Anna Squicciarini, Elisa Bertino

    In open systems, the verification of properties of subjects is crucial for authorization purposes. Very often access to resources is based on policies that express (possibly complex) requirements in terms of what are referred to variously as identity properties, attributes, or characteristics of the subject. In this paper we provide an approach that an entity called a verifier can use to evaluate queries about properties of a subject requesting resources that are relevent deciding whether the requested action is authorized. Specifically, we contribute techniques that enable reuse of previously computed query results. We consider issues related to temporal validity as well as issues related to confidentiality when one entity reuses query results computed by another entity. We employ constraint logic programming as the foundation of our policy rules and query evaluation. This provides a very general, flexible basis, and enable our work to be applied more or less directly to several existing policy frameworks. The process of evaluation of a query against a subject identity is traced through a structure, referred to as identity proof tree, that carries all information proving that a policy requirement is met.

    View the Full Publication
  • 01/01/2007Challenges of Testing Web Services and Security in SOA ImplementationsAbbie Barbir, Chris Hobbs, Elisa Bertino, Frederick Hirsch, Lorenzo D. Martino

    The World Wide Web is evolving into a medium providing a wide array of e-commerce, business-to-business, business-to-consumer, and other information-based services. In Service Oriented Architecture (SOA) technology, Web Services are emerging as the enabling technology that bridges decoupled systems across various platforms, programming languages, and applications. The benefits of Web Services and SOA come at the expense of introducing new level of complexity to the environments where these services are deployed. This complexity is compounded by the freedom to compose Web Services to address requirements such as quality of service (QoS), availability, security, reliability, and cost. The complexity of composing services compounds the task of securing, testing, and managing the quality of the deployed services. This chapter identifies the main security requirements for Web Services and describes how such security requirements are addressed by standards for Web Services security recently developed or under development by various standardizations bodies. Standards are reviewed according to a conceptual framework that groups them by the main functionalities they provide. Testing composite services in SOA environment is a discipline at an early stage of study. The chapter provides a brief overview of testing challenges that face early implementers of composite services in SOA taking into consideration Web Services security. The importance of Web Services Management systems in Web Services deployment is discussed. A step toward a fault model for Web Services is provided. The chapter investigates the use of crash-only software development techniques for enhancing the availability of Web Services. The chapter discusses security mechanisms from the point of view of interoperability of deployed services. The work discusses the concepts and strategies as developed by the WS-I Basic Security profile for enhancing the interoperability of secure Web Services.

    View the Full Publication
  • 01/01/2007Trust Negotiation in Identity ManagementAbhilasha Bhargav-Spantzel, Anna Squicciarini, Elisa Bertino

    Most organizations require the verification of personal information before providing services, and the privacy of such information is of growing concern. The authors show how federated identity management systems can better protect users' information when integrated with trust negotiation. In today's increasingly competitive business environment, more and more leading organizations are building Web-based infrastructures to gain the strategic advantages of collaborative networking. However, to facilitate collaboration and fully exploit such infrastructures, organizations must identify each user in the collaborative network as well as the resources each user is authorized to access. User identification and access control must be carried out so as to maximize user convenience and privacy without increasing organizations1 operational costs. A federation can serve as the basic context for determining suitable solutions to this issue. A federation is a set of organizations that establish trust relationships with respect to the identity information-the federated identity information-that is considered valid. A federated identity management system (idM) provides a group of organizations that collaborate with mechanisms for managing and gaining access to user identity information and other resources across organizational boundaries

    View the Full Publication
  • 01/01/2007A Critique of the ANSI Standard on Role-Based Access ControlNinghui Li, Ji-Won Byun, Elisa Bertino

    In 2004, the American National Standards Institute approved the Role-Based Access Control standard to fulfill "a need among government and industry purchasers of information technology products for a consistent and uniform definition of role based access control (RBAC) features". Such uniform definitions give IT product vendors and customers a common and unambiguous terminology for RBAC features, which can lead to wider adoption of RBAC and increased productivity. However, the current ANSI RBAC Standard has several limitations, design flaws, and technical errors that, it unaddressed, could lead to confusions among IT product vendors and customers and to RBAC implementations with different semantics, thus defeating the standard's purpose.

    View the Full Publication
  • 01/01/2007An Analysis Study on Zone-Based Anonymous Communication in Mobile Ad Hoc NetworksXiaoxin Wu, Elisa Bertino

    A zone-based anonymous positioning routing protocol for ad hoc networks, enabling anonymity of both source and destination, is proposed and analyzed. According to the proposed algorithm, a source sends data to an anonymity zone, where the destination node and a number of other nodes are located. The data is then flooded within the anonymity zone so that a tracer is not able to determine the actual destination node. Source anonymity is also enabled because the positioning routing algorithms do not require the source ID nor its position for the correct routing. We develop anonymity protocols for both routeless and route-based data delivery algorithms. To evaluate anonymity, we propose a "measure of anonymity" and we develop an analytical model to evaluate it. By using this model we perform an extensive analysis of the anonymity protocols to determine the parameters that most impact the anonymity level.

    View the Full Publication
  • 01/01/2007A dynamic key management solution to access hierarchyXukai Zou, Yogesh Karandikar, Elisa Bertino

    Hierarchical access control (HAC) has been a fundamental problem in computer and network systems. Since Akl and Taylor proposed the first HAC scheme based on number theory in 1983, cryptographic key management techniques for HAC have appeared as a new and promising class of solutions to the HAC problem. Many cryptographic HAC schemes have been proposed in the past two decades. One common feature associated with these schemes is that they basically limited dynamic operations at the node level. In this paper, by introducing the innovative concept of ‘access polynomial’ and representing a key value as the sum of two polynomials in a finite field, we propose a new key management scheme for dynamic access hierarchy. The newly proposed scheme supports full dynamics at both the node level and user level in a uniform yet efficient manner. Furthermore, the new scheme allows access hierarchy to be a random structure and can be flexibly adapted to many other access models such as ‘transfer down’ and ‘depth-limited transfer’.

    View the Full Publication
  • 01/01/2007Failure-aware checkpointing in fine-grained cycle sharing systemsXiaojuan Ren, Rudolf Eigenmann, Saurabh Bagchi

    Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amountof idle computational resources available on the Internet. Such systems allow guest jobs to run on a host if they do not significantly impact the local users of the host. Since the hosts are typically provided voluntarily, their availability fluctuates greatly. To provide fault tolerance to guest jobs without adding significant computational overhead, we propose failure-aware checkpointing techniques that apply the knowledge of resource availability to select checkpoint repositories and to determine checkpoint intervals. We present the schemes of selecting reliable and efficient repositories from the non-dedicated hosts that contribute their disk storage. These schemes are formulated as 0/1 programming problems to optimize the network overhead of transferring checkpoints and the work lost due to unavailability of a storage host when needed to recover a guest job. We determine the checkpoint interval by comparing the cost of checkpointing immediately and the cost of delaying that to a later time, which is a function of the resource availability. We evaluate these techniques on an FGCS system called iShare, using trace-based simulation. The results show that they achieve better application performance than the prevalent methods which use checkpointing with a fixed periodicity on dedicated checkpoint servers.

    View the Full Publication
  • 01/01/2007Speculative thread decomposition through empirical optimizationTroy Johnson, Rudolf Eigenmann, T.N. Vijaykumar

    Chip multiprocessors (CMPs), or multi-core processors, have become a common way of reducing chip complexity and power consumption while maintaining high performance. Speculative CMPs use hardware to enforce dependence, allowing a parallelizing compiler to generate multithreaded code without needing to prove independence. In these systems, a sequential program is decomposed into threads to be executed in parallel; dependent threads cause performance degradation, but do not affect correctness. Thread decomposition attempts to reduce the run-time overheads of data dependence, thread misprediction, and load imbalance. Because these overheads depend on the runtimes of the threads that are being created by the decomposition, reducing the overheads while creating the threads is a circular problem. Static compile-time decomposition handles this problem by estimating the run times of the candidate threads, but is limited by the estimates' inaccuracy. Dynamic execution-time decomposition in hardware has better run-time information, but is limited by the decomposition hardware's complexity and run-time overhead. We propose a third approach where a compiler instruments a profile run of the application to search through candidate threads and pick the best threads as the profile run executes. The resultant decomposition is compiled into the application so that a production run of the application has no instrumentation and does not incurany decomposition overhead. We avoid static decomposition's estimation accuracy problem by using actual profile-run execution times to pick threads, and we avoid dynamic decomposition's overhead by performing the decomposition at profile time. Because we allow candidate threads to span arbitrary sections of the application's call graph and loop nests, an exhaustive search of the decomposition space is prohibitive, even in profile runs. To address this issue, we make the key observation that the run-time overhead of a thread depends, to the first order, only on threads that overlap with the thread inexecution (e.g., in a four-core CMP, a given thread can overlap with at most three preceding and three following threads). This observation implies that a given thread affects only a few other threads, allowing pruning of the space. Using a CMP simulator, we achieve an average speedup of 3.51 on four cores for five of the SPEC CFP2000 benchmarks, which compares favorably to recent static techniques. We also discuss experiments with CINT2000.

    View the Full Publication
  • 01/01/2007Prediction of Resource Availability in Fine-Grained Cycle Sharing Systems Empirical EvaluationXiaojuan Ren, Seyong Lee, Rudolf Eigenmann, Saurabh Bagchi

    Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utilize the CPU cycles if the jobs do not significantly impact the local users. Such resources are generally provided voluntarily and their availability fluctuates highly. Guest jobs may fail unexpectedly, as resources become unavailable. To improve this situation, we consider methods to predict resource availability. This paper presents empirical studies on resource availability in FGCS systems and a prediction method. From studies on resource contention among guest jobs and local users, we derive a multi-state availability model. The model enables us to detect resource unavailability in a non-intrusive way. We analyzed the traces collected from a production FGCS system for 3 months. The results suggest the feasibility of predicting resource availability, and motivate our method of applying semi-Markov Process models for the prediction. We describe the prediction framework and its implementation in a production FGCS system, named iShare. Through the experiments on an iShare testbed, we demonstrate that the prediction achieves an accuracy of 86% on average and outperforms linear time series models, while the computational cost is negligible. Our experimental results also show that the prediction is robust in the presence of irregular resource availability. We tested the effectiveness of the prediction in a proactive scheduler. Initial results show that applying availability prediction to job scheduling reduces the number of jobs failed due to resource unavailability.

    View the Full Publication
  • 01/01/2007Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance TuningZhelong Pan, Rudolf Eigenmann

    Although compile-time optimizations generally improve program performance, degradations caused by individual techniques are to be expected. One promising research direction to overcome this problem is the development of dynamic, feedback-directed optimization orchestration algorithms, which automatically search for the combination of optimization techniques that achieves the best program performance. The challenge is to develop an orchestration algorithm that finds, in an exponential search space, a solution that is close to the best, in acceptable time. In this paper, we build such a fast and effective algorithm, called Combined Elimination (CE). The key advance of CE over existing techniques is that it takes the least tuning time (57% of the closest alternative), while achieving the same program performance. We conduct the experiments on both a Pentium IV machine and a SPARC II machine, by measuring performance of SPEC CPU2000 benchmarks under a large set of 38 GCC compiler options. Furthermore, through orchestrating a small set of optimizations causing the most degradation, we show that the performance achieved by CE is close to the upper bound obtained by an exhaustive search algorithm. The gap is less than 0.2% on average.

    View the Full Publication
  • 01/01/2007XML-Based Policy Engineering Framework for Heterogeneous Network ManagementArjmand Samuel, Shahab Baqai, Arif Ghafoor

    Network services, resources, protocols, and communication technologies give rise to multiple dimensions of heterogeneity in an enterprise, resulting in a complex administration and management operation. The administrative challenge is further exacerbated by the fact that multiple enterprises share resources and users, giving rise to semantic conflicts. While Policy-Based Network Management (PBNM) provides an effective means of automating and simplifying administrative tasks, it can also cause conflicts between policies meant for separate network entities, consequently giving rise to policy heterogeneity. In order to address issues of network and policy heterogeneity, we propose a policy engineering framework using the tried and tested system development methodologies from Software Development Life Cycle (SDLC) and apply it to PBNM language engineering. We present an XML based policy specification language, X-Enterprise, its corresponding UML meta-model along with a context sensitive and adaptive implementation framework. Use of XML and UML affords an open standard for cross-architecture implementation and use of existing UML tools for consistency and conflict analysis.

    View the Full Publication
  • 01/01/2007Knowledge Extraction for High-Throughput Biological ImagingWamiq Ahmed, Arif Ghafoor, J. Paul Robinson

    We present a multilayered architecture and spatiotemporal models for searching, retrieving, and analyzing high-throughput biological imaging data. The analysis is divided into low- and high-level processing. At the lower level, we address issues like segmentation, tracking, and object recognition. At the high level, we use finite-state-machine- and Petri-net-based models for spatiotemporal event recognition.

    View the Full Publication
  • 01/01/2007Distributed multimedia information systems: an end-to-end perspectiveArif Ghafoor

    Emerging Web-based applications require distributed multimedia information system (DMIS) infrastructures. Examples of such applications abound in the domains of medicine, entertainment, manufacturing, e-commerce, as well as military and critical national infrastructures. Development of DMIS for such applications need a broad range of technological solutions for organizing, storing, and delivering multimedia information in an integrated, secure and timely manner with guaranteed end-to-end (E2E) quality of presentation (QoP). DMIS are viewed as catalysts for new research in many areas, ranging from basic research to applied technology. This view is a result of the fact that no single monolithic end-to-end architecture for DMIS can meet the wide spectrum of characteristics and requirements of various Web-based multimedia applications. One size does not fit all in this medium of communication. Management of integrated end-to-end QoP and ensuring information security in DMIS, when viewed in conjunction with real world constraints and system-wide performance requirements, present formidable research and implementation challenges. These challenges encompass all the sub-system components of a DMIS. The ultimate objective of achieving a comprehensive end-to-end QoP management relies on the performance and allocation of resources of each of the DMIS sub-system components including networks, databases, and end-systems. In this paper, we elaborate on these challenges and present a high level distributed architecture aimed at providing the critical functionality for a DMIS.

    View the Full Publication
  • 01/01/2007LAHVA: Linked Animal-Human Health Visual AnalyticsRoss Maciejewski, Benjamin Tyner, Yun Jang, Cheng Zheng, Rimma Nehme, David S. Ebert, William S. Cleveland, Mourad Ouzzani, Shaun Grannis, Lawrence Glickman

    Coordinated animal-human health monitoring can provide an early warning system with fewer false alarms for naturally occurring disease outbreaks, as well as biological, chemical and environmental incidents. This monitoring requires the integration and analysis of multi-field, multi-scale and multi-source data sets. In order to better understand these data sets, models and measurements at different resolutions must be analyzed. To facilitate these investigations, we have created an application to provide a visual analytics framework for analyzing both human emergency room data and veterinary hospital data. Our integrated visual analytic tool links temporally varying geospatial visualization of animal and human patient health information with advanced statistical analysis of these multi-source data. Various statistical analysis techniques have been applied in conjunction with a spatio-temporal viewing window. Such an application provides researchers with the ability to visually search the data for clusters in both a statistical model view and a spatio-temporal view. Our interface provides a factor specification/filtering component to allow exploration of causal factors and spread patterns. In this paper, we will discuss the application of our linked animal-human visual analytics (LAHVA) tool to two specific case studies. The first case study is the effect of seasonal influenza and its correlation with different companion animals (e.g., cats, dogs) syndromes. Here we use data from the Indiana Network for Patient Care (INPC) and Banfield Pet Hospitals in an attempt to determine if there are correlations between respiratory syndromes representing the onset of seasonal influenza in humans and general respiratory syndromes in cats and dogs. Our second case study examines the effect of the release of industrial wastewater in a community through companion animal surveillance.

    View the Full Publication
  • 12/01/2006Adaptive rank-aware query optimization in relational databasesIhab Ilyas, Walid G. Aref, Ahmed Elmagarmid, Hicham Elmongui, Rahul Shah, Jeffrey S. Vitter

    Rank-aware query processing has emerged as a key requirement in modern applications. In these applications, efficient and adaptive evaluation of top-k queries is an integral part of the application semantics. In this article, we introduce a rank-aware query optimization framework that fully integrates rank-join operators into relational query engines. The framework is based on extending the System R dynamic programming algorithm in both enumeration and pruning. We define ranking as an interesting physical property that triggers the generation of rank-aware query plans. Unlike traditional join operators, optimizing for rank-join operators depends on estimating the input cardinality of these operators. We introduce a probabilistic model for estimating the input cardinality, and hence the cost of a rank-join operator. To our knowledge, this is the first effort in estimating the needed input size for optimal rank aggregation algorithms. Costing ranking plans is key to the full integration of rank-join operators in real-world query processing engines.Since optimal execution strategies picked by static query optimizers lose their optimality due to estimation errors and unexpected changes in the computing environment, we introduce several adaptive execution strategies for top-k queries that respond to these unexpected changes and costing errors. Our reactive reoptimization techniques change the execution plan at runtime to significantly enhance the performance of running queries. Since top-k query plans are usually pipelined and maintain a complex ranking state, altering the execution strategy of a running ranking query is an important and challenging task.We conduct an extensive experimental study to evaluate the performance of the proposed framework. The experimental results are twofold: (1) we show the effectiveness of our cost-based approach of integrating ranking plans in dynamic programming cost-based optimizers; and (2) we show a significant speedup (up to 300%) when using our adaptive execution of ranking plans over the state-of-the-art mid-query reoptimization strategies.

    View the Full Publication
  • 12/01/2006bdbms -- A Database Management System for Biological DataMohamed Eltabakh, Mourad Ouzzani, Walid G. Aref

    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.

    View the Full Publication
  • 12/01/2006Trust Negotiations with Customizable AnonymityAnna Squicciarini, Abhilasha Bhargav-Spantzel, Elisa Bertino, Elena Ferrari, Indrakshi Ray

    Trust negotiation makes it possible for two parties to carry on secure transactions by first establishing trust through a bilateral, iterative process of requesting and disclosing digital credentials and policies. Credentials, exchanged during trust negotiations, often contain sensitive attributes that attest to the properties of the credential owner. Uncontrolled disclosure of such sensitive attributes may cause grave damage to the credential owner. Research has shown that disclosing non-sensitive attributes only can cause identity to be revealed as well. Consequently, we impose a stronger requirement: our negotiations should have the k-anonymity property -- the set of credentials submitted by a subject during a negotiation should be equal to k other such sets received by the counterpart during earlier negotiations. In this paper we propose a protocol that ensures k-anonymity. Our protocol has a number of important features. First, a credential submitter before submitting its set of credentials has the assurance that its set will be identical to k other sets already stored with the counterpart. Second, we provide a cryptographic protocol ensuring that the credentials submitted by the submitter during different negotiations cannot be linked to each other. Third, we ensure that the critical data exchanged during the protocol is valid. Fourth, the major part of the protocol involves the negotiating parties only; the protocol invokes the validator only only when some critical information needs to be validated.

    View the Full Publication
  • 12/01/2006STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding WindowsMohamed Elfeky, Walid G. Aref, Ahmed Elmagarmid

    Sensor devices are becoming ubiquitous, especially in measurement and monitoring applications. Because of the real-time, append-only and semi-infinite natures of the generated sensor data streams, an online incremental approach is a necessity for mining stream data types. In this paper, we propose STAGGER: a one-pass, online and incremental algorithm for mining periodic patterns in data streams. STAGGER does not require that the user pre-specify the periodicity rate of the data. Instead, STAGGER discovers the potential periodicity rates. STAGGER maintains multiple expanding sliding windows staggered over the stream, where computations are shared among the multiple overlapping windows. Small-length sliding windows are imperative for early and real-time output, yet are limited to discover short periodicity rates. As streamed data arrives continuously, the sliding windows expand in length in order to cover the whole stream. Larger-length sliding windows are able to discover longer periodicity rates. STAGGER incrementally maintains a tree-like data structure for the frequent periodic patterns of each discovered potential periodicity rate. In contrast to the Fourier/Wavelet-based approaches used for discovering periodicity rates, STAGGER not only discovers a wider, more accurate set of periodicities, but also discovers the periodic patterns themselves. In fact, experimental results with real and synthetic data sets show that STAGGER outperforms Fourier/Wavelet-based approaches by an order of magnitude in terms of the accuracy of the discovered periodicity rates. Moreover, real data experiments demonstrate the practicality of the discovered periodic patterns.

    View the Full Publication
  • 12/01/2006Adaptive rank-aware query optimization in relational databasesIhab Ilyas, Walid G. Aref, Ahmed Elmagarmid, Hicham Elmongui, Rahul Shah, Jeffrey S. Vitter

    Rank-aware query processing has emerged as a key requirement in modern applications. In these applications, efficient and adaptive evaluation of top-k queries is an integral part of the application semantics. In this article, we introduce a rank-aware query optimization framework that fully integrates rank-join operators into relational query engines. The framework is based on extending the System R dynamic programming algorithm in both enumeration and pruning. We define ranking as an interesting physical property that triggers the generation of rank-aware query plans. Unlike traditional join operators, optimizing for rank-join operators depends on estimating the input cardinality of these operators. We introduce a probabilistic model for estimating the input cardinality, and hence the cost of a rank-join operator. To our knowledge, this is the first effort in estimating the needed input size for optimal rank aggregation algorithms. Costing ranking plans is key to the full integration of rank-join operators in real-world query processing engines.Since optimal execution strategies picked by static query optimizers lose their optimality due to estimation errors and unexpected changes in the computing environment, we introduce several adaptive execution strategies for top-k queries that respond to these unexpected changes and costing errors. Our reactive reoptimization techniques change the execution plan at runtime to significantly enhance the performance of running queries. Since top-k query plans are usually pipelined and maintain a complex ranking state, altering the execution strategy of a running ranking query is an important and challenging task.We conduct an extensive experimental study to evaluate the performance of the proposed framework. The experimental results are twofold: (1) we show the effectiveness of our cost-based approach of integrating ranking plans in dynamic programming cost-based optimizers; and (2) we show a significant speedup (up to 300 percnt;) when using our adaptive execution of ranking plans over the state-of-the-art mid-query reoptimization strategies.

    View the Full Publication
  • 10/01/2006Secure Dissemination of XML Content Using Structure-based RoutingAshish Kundu, Elisa Bertino

    The paper proposes an approach to content dissemination that exploits the structural properties of XML Document Object Model in order to provide efficient dissemination by at the same time assuring content integrity and confidentiality. Our approach is based on the notion of encrypted post-order numbers that support the integrity and confidentiality requirements of XML content as well as facilitate efficient identification, extraction and distribution of selected content portions. By using such notion, we develop a structure-based routing scheme that prevents information leaks in XML-data dissemination and assures that content is delivered to users according to the access control policies, that is, policies specifying which users can receive which portions of the contents. Our proposed dissemination approach further enhances such structure-based, policy-based routing by combining it with multicast in order to provide high efficiency in terms of bandwidth usage and speed of data delivery, thereby enhancing scalability.

    View the Full Publication
  • 09/01/2006Access Control and Authorization Constraints for WS-BPELElisa Bertino, Jason Crampton, Federica Paci

    Computerized workflow systems have attracted considerable research interest in the last fifteen years. More recently, there have been several XML-based languages proposed for specifying and orchestrating business processes, culminating in WS-BPEL. A significant omission from WSBPEL is the ability to specify authorization information associating users with activities in the business process and authorization constraints on the execution of activities such as separation of duty. In this paper, we address these deficiencies by developing the RBAC-WS-BPEL and BPCL languages. The first of these provides for the specification of authorization information associated with a business process specified in WS-BPEL, while BPCL provides for the articulation of authorization constraints.

    View the Full Publication
  • 09/01/2006Security in SOA and Web ServicesElisa Bertino, Lorenzo D. Martino

    Security is today a relevant requirement for any distributed application, and in particular for these enabled by the Web such as e-health, e-commerce, and e-learning. It is thus crucial that the use of Web services, stand-alone or composed, provide strong security guarantees. Web services security encompasses several requirements that can be described along the well known security dimensions, that is: integrity, whereby a message must remain unaltered during transmission; confidentiality, whereby the contents of a message cannot be viewed while in transit, except by authorized services; availability, whereby a message is promptly delivered to the intended recipient, thus ensuring that legitimate users receive the services they are entitled to. Moreover, each Web service must protect its own resources against unauthorized access. This in turn requires suitable means for: identification, whereby the recipient of a message must be able to identify the sender; authentication, whereby the recipient of a message needs to verify the claimed identity of the sender; authorization, whereby the recipient of a message needs to apply access control policies to determine whether the sender has the right to use the required resources.

    View the Full Publication
  • 08/01/2006Empirical Studies on the Behavior of Resource Availability in Fine-Grained Cycle Sharing SystemsXiaojuan Ren, Rudolf Eigenmann

    Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utilize the CPU cycles if the jobs do not significantly impact local host users. Such resources are generally provided voluntarily and their availability fluctuates highly. Guest jobs may fail unexpectedly, as resource becomes unavailable. We present empirical studies on the detection and predictability of resource availability in FGCS systems. A multi-state availability model is derived from a study of resource behavior. The model combines generic hardwaresoftware failures with domain-specific resource behavior in FGCS. To understand the predictability, we traced resource availability in a production FGCS system for three months. We found that the daily patterns of resource availability are comparable to those in recent history. This observation suggests the feasibility of predicting future resource availability, which can be applied for proactive management of guest jobs.

    View the Full Publication
  • 07/01/2006Scalability Management in Sensor-Network PhenomenaBasesMohamed Ali, Walid G. Aref, Ibrahim Kamel

    A phenomenon appears in a sensor network when a group of sensors persist to generate similar behavior over a period of time. PhenomenaBases (or databases of phenomena) are equipped with Phenomena Detection and Tracking (PDT) techniques that continuously run in the background of a sensor database system to detect new phenomena and to track already existing phenomena. The process of phenomena detection and tracking is initiated by a multi-way join operator that comes at the core of PDT techniques to report similar sensor readings. With the increase in the sensor network size, the join operator (and, consequently, query processing in the PhenomenaBase) face several scalability challenges. In this paper, we present a join operator for PhenomenaBases (the SNJoin operator) that is specially-designed for dynamically-configured large-scale sensor networks with distributed processing capabilities. Experimental studies illustrate the scalability of the proposed join operator in PhenomenaBases with respect to the number of detected phenomena and the output delay.

    View the Full Publication
  • 06/01/2006Policy Languages for Digital Identity Management in Federation SystemsElisa Bertino, Abhilasha Bhargav-Spantzel, Anna Squicciarini

    The goal of service provider federations is to support a controlled method by which distributed organizations can provide services to qualified individuals and manage their identity attributes at an inter-organizational level. In order to make access control decisions the history of activities should be accounted for, therefore it is necessary to record information on interactions among the federation entities. To achieve these goals we propose a comprehensive assertion language able to support description of static and dynamic properties of the federation system. The assertions are a powerful means to describe the behavior of the entities interacting in the federation, and to define policies controlling access to services and privacy policies. We also propose a log-based approach for capturing the history of activities within the federationimplemented as a set of tables stored at databases at the various organizations in the federation. We illustrate how, by using different types of queries on such tables, security properties of the federation can be verified.

    View the Full Publication
  • 06/01/2006Resource Availability Prediction in Fine-Grained Cycle Sharing SystemsXiaojuan Ren, Seyong Lee, Rudolf Eigenmann, Saurabh Bagchi

    Fine-grained cycle sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utilize the CPU cycles if the jobs do not significantly impact the local users of a host. A characteristic of such resources is that they are generally provided voluntarily and their availability fluctuates highly. Guest jobs may fail because of unexpected resource unavailability. To provide fault tolerance to guest jobs without adding significant computational overhead, it requires to predict future resource availability. This paper presents a method for resource availability prediction in FGCS systems. It applies a semi-Markov Process and is based on a novel resource availability model, combining generic hardware-software failures with domain-specific resource behavior in FGCS. We describe the prediction framework and its implementation in a production FGCS system named iShare. Through the experiments on an iShare testbed, we demonstrate that the prediction achieves accuracy above 86% on average and outperforms linear time series models, while the computational cost is negligible. Our experimental results also show that the prediction is robust in the presence of irregular resource unavailability

    View the Full Publication
  • 05/01/2006ABSTRACT Access Control Enforcement for Conversation-based Web ServicesMassimo Mecella, Federica Paci, Mourad Ouzzani, Elisa Bertino

    Service Oriented Computing is emerging as the main approach to build distributed enterprise applications on the Web. The widespread use of Web services is hindered by the lack of adequate security and privacy support. In this paper, we present a novel framework for enforcing access control in conversation-based Web services. Our approach takes into account the conversational nature of Web services. This is in contrast with existing approaches to access control enforcement that assume a Web service as a set of independent operations. Furthermore, our approach achieves a tradeoff between the need to protect Web service's access control policies and the need to disclose to clients the portion of access control policies related to the conversations they are interested in. This is important to avoid situations where the client cannot progress in the conversation due to the lack of required security requirements. We introduce the concept of k-trustworthiness that defines the conversations for which a client can provide credentials maximizing the likelihood that it will eventually hit a final state.

    View the Full Publication
  • 05/01/2006LUGrid: Update-tolerant Grid-based Indexing for Moving ObjectsXiopeng Xiong, Mohamed Mokbel, Walid G. Aref

    Indexing moving objects is a fundamental issue in spatiotemporal databases. In this paper, we propose an adaptive Lazy-Update Grid-based index (LUGrid, for short) that minimizes the cost of object updates. LUGrid is designed with two important features, namely, lazy insertion and lazy deletion. Lazy insertion reduces the update I/Os by adding an additional memory-resident layer over the disk index. Lazy deletion reduces update cost by avoiding deleting single obsolete entry immediately. Instead, the obsolete entries are removed later by specially designed mechanisms. LUGrid adapts to object distributions through cell splitting and merging. Theoretical analysis and experimental results indicate that LUGrid outperforms former work by up to eight times when processing intensive updates, while yielding similar search performance.

    View the Full Publication
  • 05/01/2006A Decentralized Approach for Controlled Sharing of Resources in Virtual CommunitiesElisa Bertino, Anna Squicciarini

    A virtual community is a composition of heterogeneous and independently designed subsystems, focusing on large-scale resource sharing, innovative applications and in some cases high performance computation. The sharing that we refer to is the direct access to computers, software, and data emerging in fields like science, industry and engineering. Several open issues need to be addressed in order to make possible these dynamic environments, such as how to manage access policies to coordinate resource sharing, how to establish a community, how to ensure that member communities respect community policies and so on.

    View the Full Publication
  • 05/01/2006Access Control and Privacy in Location-Aware Services forMobile OrganizationsMaria Luisa Damiani, Elisa Bertino

    In mobile organizations such as enterprises operating on field, healthcare organizations and military and civilian coalitions, individuals, because of the role they have, may need to access common information resources through location-aware applications. To enable a controlled and privacy preserving access to such applications, a comprehensive conceptual framework for an access control system enhanced with location privacy is presented.

    View the Full Publication
  • 05/01/2006Secure knowledge management: confidentiality, trust, and privacyElisa Bertino, Latifur Khan, Ravi Sandhu, Bhavani Thuraisingham

    Knowledge management enhances the value of a corporation by identifying the assets and expertise as well as efficiently managing the resources. Security for knowledge management is critical as organizations have to protect their intellectual assets. Therefore, only authorized individuals must be permitted to execute various operations and functions in an organization. In this paper, secure knowledge management will be discussed, focusing on confidentiality, trust, and privacy. In particular, certain access-control techniques will be investigated, and trust management as well as privacy control for knowledge management will be explored.

    View the Full Publication
  • 05/01/2006PEAK—a fast and effective performance tuning system via compiler optimization orchestrationZhelong Pan, Rudolf Eigenmann

    Compile-time optimizations generally improve program performance. Nevertheless, degradations caused by individual compiler optimization techniques are to be expected. Feedback-directed optimization orchestration systems generate optimized code versions under a series of optimization combinations, evaluate their performance, and search for the best version. One challenge to such systems is to tune program performance quickly in an exponential search space. Another challenge is to achieve high program performance, considering that optimizations interact. Aiming at these two goals, this article presents an automated performance tuning system, PEAK, which searches for the best compiler optimization combinations for the important code sections in a program. The major contributions made in this work are as follows: (1) An algorithm called Combined Elimination (CE) is developed to explore the optimization space quickly and effectively; (2) Three fast and accurate rating methods are designed to evaluate the performance of an optimized code section based on a partial execution of the program; (3) An algorithm is developed to identify important code sections as candidates for performance tuning by trading off tuning speed and tuned program performance; and (4) A set of compiler tools are implemented to automate optimization orchestration. Orchestrating optimization options in SUN Forte compilers at the whole-program level, our CE algorithm improves performance by 10.8 percent; over the SPEC CPU2000 FP baseline setting, compared to 5.6 percent; improved by manual tuning. Orchestrating GCC O3 optimizations, CE improves performance by 12 percent; over O3, the highest optimization level. Applying the rating methods, PEAK reduces tuning time from 2.19 hours to 5.85 minutes on average, while achieving equal or better program performance.

    View the Full Publication
  • 04/01/2006Private Updates to Anonymous DatabasesAlberto Trombetta, Elisa Bertino

    Suppose that Alice, owner of a k-anonymous database, needs to determine whether her database, when adjoined with a tuple owned by Bob, is still k-anonymous. Suppose moreover that access to the database is strictly controlled, because for example data are used for experiments that need to be maintained confidential. Clearly, allowing Alice to directly read the contents of the tuple breaks the privacy of Bob; on the other hand, the confidentiality of the database managed by Alice is violated once Bob has access to the contents of the database. Thus the problem is to check whether the database adjoined with the tuple is still k-anonymous, without letting Alice and Bob know the contents of, respectively, the tuple and the database. In this paper, we propose two protocols solving this problem.

    View the Full Publication
  • 04/01/2006Technique for Optimal Adaptation of Time-Dependent Workflows with Security ConstraintsBasit Shafiq, Arjmand Samuel, Elisa Bertino, Arif Ghafoor

    Distributed workflow based systems are widely used in various application domains including e-commerce, digital government, healthcare, manufacturing and many others. Workflows in these application domains are not restricted to the administrative boundaries of a single organization [1]. The tasks in a workflow need to be performed in a certain order and often times are subject to temporal constraints and dependencies [1, 2]. A key requirement for such workflow applications is to provide the right data to the right person at the right time. This requirement motivates for dynamic adaptations of workflows for dealing with changing environmental conditions and exceptions.

    View the Full Publication
  • 04/01/2006Merging Source Query Interfaces onWeb DatabasesEduard Dragut, Wensheng Wu, Prasad Sistla, Clement Yu, Weiyi Meng

    Recently, there are many e-commerce search engines that return information from Web databases. Unlike text search engines, these e-commerce search engines have more complicated user interfaces. Our aim is to construct automatically a natural query user interface that integrates a set of interfaces over a given domain of interest. For example, each airline company has a query interface for ticket reservation and our system can construct an integrated interface for all these companies. This will permit users to access information uniformly from multiple sources. Each query interface from an e-commerce search engine is designed so as to facilitate users to provide necessary information. Specifically, (1) related pieces of information such as first name and last name are grouped together and (2) certain hierarchical relationships are maintained. In this paper, we provide an algorithm to compute an integrated interface from query interfaces of the same domain. The integrated query interface can be proved to preserve the above two types of relationships. Experiments on five domains verify our theoretical study.

    View the Full Publication
  • 04/01/2006Executing MPI programs on virtual machines in an Internet sharing systemZhelong Pan, Xiaojuan Ren, Rudolf Eigenmann, Dongyan Xu

    Internet sharing systems aim at federating and utilizing distributed computing resources across the Internet. This paper presents a user-level virtual machine (VM) approach to MPI program execution in an Internet sharing framework. In this approach, the resource consumer has its own operating system running on top of and isolated from, the operating system of the resource provider. We propose an efficient socket virtualization technique to optimize VM network performance. Socket virtualization achieves the same network bandwidth as the physical network. In our LAN environment, it reduces the latency overhead from 112% (using existing TUN/TAP technique) to 35.6%. Performance results on MPI benchmarks show that our virtualization technique incurs small overhead compared with the physical host platform, while gaining in return a higher degree of guest isolation and customization. We also describe the key mechanisms that allow the employment of VMs in an existing Internet sharing system.

    View the Full Publication
  • 03/01/2006Exploiting predicate-window semantics over data streamsThanaa Ghanem, Walid G. Aref, Ahmed Elmagarmid

    The continuous sliding-window query model is used widely in data stream management systems where the focus of a continuous query is limited to a set of the most recent tuples. In this paper, we show that an interesting and important class of queries over data streams cannot be answered using the sliding-window query model. Thus, we introduce a new model for continuous window queries, termed the predicate-window query model that limits the focus of a continuous query to the stream tuples that qualify a certain predicate. Predicate-window queries have some distinguishing characteristics, e.g., (1) The window predicate can be defined over any attribute in the stream tuple (ordered or unordered). (2) Stream tuples qualify and disqualify the window predicate in an out-of-order manner. In this paper, we discuss the applicability of the predicate-window query model. We will show how the existing sliding-window query models fail to answer some of the predicate-window queries. Finally, we discuss the challenges in supporting the predicate-window query model in data stream management systems.

    View the Full Publication
  • 01/01/2006Challenges in spatiotemporal stream query optimizationHicham Elmongui, Mourad Ouzzani, Walid G. Aref

    Simplified technology and low costs have spurred the use of location-detection devices in moving objects. Usually, these devices will send the moving objects' location information to a spatio-temporal data stream management system, which will be then responsible for answering spatio-temporal queries related to these moving objects. A large spectrum of research have been devoted to continuous spatio-temporal query processing. However, we argue that several outstanding challenges have been either addressed partially or not at all in the existing literature. In particular, in this paper, we focus on the optimization of multi-predicate spatio-temporal queries on moving objects. We present several major challenges related to the lack of spatio-temporal pipelined operators, and the impact of time, space, and their combination on the query plan optimality under different circumstances mof query and object distributions. We show that building an adaptive query optimization framework is key in addressing these challenges and coping with the dynamic nature of the environment we are evolving in.

    View the Full Publication
  • 01/01/2006Discovering Consensus Patterns in Biological DatabasesMohamed Eltabakh, Walid G. Aref, Mourad Ouzzani, Mohamed Ali

    Consensus patterns, like motifs and tandem repeats, are highly conserved patterns with very few substitutions where no gaps are allowed. In this paper, we present a progressive hierarchical clustering technique for discovering consensus patterns in biological databases over a certain length range. This technique can discover consensus patterns with various requirements by applying a post-processing phase. The progressive nature of the hierarchical clustering algorithm makes it scalable and efficient. Experiments to discover motifs and tandem repeats on real biological databases show significant performance gain over non-progressive clustering techniques.

    View the Full Publication
  • 01/01/2006A semantic approach to build personalized interfaces in the cultural heritage domainStefano Valtolina, Pietro Mazzoleni, Stefano Franzoni, Elisa Bertino

    In this paper we present a system we have built to disseminate cultural heritage distributed across multiple museums. Our system addresses the requirements of two categories of users: the end users that need to access information according to their interests and interaction preferences, and the domain experts and museum curators that need to develop thematic tours providing end users with a better understanding of the single artefact or collection. In our approach we make use of a semantic representation of the given heritage domain in order to build multiple visual interfaces, called "Virtual Wings" (VWs). Such interfaces allow users to navigate through data available from digital archives and thematic tours and to create their own personalized virtual visits. An interactive application integrating personalized digital guides (using PDAs) and 360 panoramic images is the example of VW presented.

    View the Full Publication
  • 01/01/2006Privacy preserving multi-factor authentication with biometricsAbhilasha Bhargav-Spantzel, Anna Squicciarini, Elisa Bertino

    An emerging approach to the problem of reducing the identity theft is represented by the adoption of biometric authentication systems. Such systems however present however several challenges, related to privacy, reliability, security of the biometric data. Inter-operability is also required among the devices used for the authentication. Moreover, very often biometric authentication in itself is not sufficient as a conclusive proof of identity and has to be complemented with multiple other proofs of identity like passwords, SSN, or other user identifiers. Multi-factor authentication mechanisms are thus required to enforce strong authentication based on the biometric and identifiers of other nature.In this paper we provide a two-phase authentication mechanism for federated identity management systems. The first phase consists of a two-factor biometric authentication based on zero knowledge proofs. We employ techniques from vector-space model to generate cryptographic biometric keys. These keys are kept secret, thus preserving the confidentiality of the biometric data, and at the same time exploit the advantages of a biometric authentication. The second authentication combines several authentication factors in conjunction with the biometric to provide a strong authentication. A key advantage of our approach is that any unanticipated combination of factors can be used. Such authentication system leverages the information of the user that are available from the federated identity management system.

    View the Full Publication
  • 01/01/2006Achieving Anonymity in Mobile Ad Hoc Networks Using Fuzzy Position InformationXiaoxin Wu, Jun Liu, Xiaoyan Hong, Elisa Bertino

    Traditionally the anonymity of an entity of interest can be achieved by hiding it among a group of other entities with similar characteristics, i.e., an anonymity set. In mobile ad hoc networks, generating and maintaining such an anonymity set for any ad hoc node are challenging because of the node mobility and consequently of the dynamic network topology. In this paper, we address the problem of the destination anonymity. We propose protocols that use fuzzy destination position to generate a geographic area called anonymity zone (AZ). A packet for a destination is delivered to all the nodes in the AZ, which, consequently, make up the anonymity set. The size of the anonymity set may decrease because nodes are mobile, yet the corresponding management on anonymity set is simple. We design techniques to further improve node anonymity. We use extensive simulation to study the node anonymity and routing performance, and to determine the parameters that most impact the anonymity level that can be achieved by our protocol.

    View the Full Publication
  • 01/01/2006Digital identity management and protectionElisa Bertino

    Digital identity management (DIM) has emerged as a critical foundation for supporting successful interaction in today's globally interconnected society. It is crucial not only for the conduct of business and government but also for a large and growing body of electronic or online social interactions. Digital identity management is usually coupled with the notion of federation. The goal of federations is to provide users with protected environments to federate identities by the proper management of identity attributes. Federations provide a controlled method by which federation members can provide more integrated and complete services to a qualified group of individuals within certain sets of business transactions. By controlling the scope of access to participating sites, and by enabling secure, cross-domain transmission of user's personal information, federations can make the perpetration of identity frauds more difficult, as well as reduce their frequency, and their potential impact. In this talk we will first discuss basic digital identity concepts and requirements towards DIM solutions and we will overview relevant initiatives currently undergoing in academia and industry. We will then focus on the problem of identity theft and discuss an initial solution to the problem of establishing and protecting digital identity.

    View the Full Publication
  • 01/01/2006Preserving User Location Privacy in Mobile Data Management InfrastructuresReynold Cheng, Yu Zhang, Elisa Bertino, Sunil Prabhakar

    Location-based services, such as finding the nearest gas station, require users to supply their location information. However, a user’s location can be tracked without her consent or knowledge. Lowering the spatial and temporal resolution of location data sent to the server has been proposed as a solution. Although this technique is effective in protecting privacy, it may be overkill and the quality of desired services can be severely affected. In this paper, we suggest a framework where uncertainty can be controlled to provide high quality and privacy-preserving services, and investigate how such a framework can be realized in the GPS and cellular network systems. Based on this framework, we suggest a data model to augment uncertainty to location data, and propose imprecise queries that hide the location of the query issuer and yields probabilistic results. We investigate the evaluation and quality aspects for a range query. We also provide novel methods to protect our solutions against trajectory-tracing. Experiments are conducted to examine the effectiveness of our approaches.

    View the Full Publication
  • 01/01/2006Traceable and Automatic Compliance of Privacy Policies in Federated Digital Identity ManagementAnna Squicciarini, Abhilasha Bhargav-Spantzel, Alexie Czeskis, Elisa Bertino

    Digital identity is defined as the digital representation of the information known about a specific individual or organization. An emerging approach for protecting identities of individuals while at the same time enhancing user convenience is to focus on inter-organization management of identity information. This is referred to as federated identity management. In this paper we develop an approach to support privacy controlled sharing of identity attributes and harmonization of privacy policies in federated environments. Policy harmonizations mechanisms make it possible to determine whether or not the transfer of identity attributes from one entity to another violate the privacy policies stated by the former. We also provide mechanisms for tracing the release of user’s identity attributes within the federation. Such approach entails a form of accountability since an entity non-compliant with the users original privacy preferences can be identified. Finally, a comprehensive security analysis details security properties is also offered.

    View the Full Publication
  • 01/01/2006Beyond k-Anonymity A Decision Theoretic Framework for Assessing Privacy RiskGuy Lebanon, Monica Scannapieco, Mohamed Fouad, Elisa Bertino

    An important issue any organization or individual has to face when managing data containing sensitive information, is the risk that can be incurred when releasing such data. Even though data may be sanitized, before being released, it is still possible for an adversary to reconstruct the original data by using additional information that may be available, for example, from other data sources. To date, however, no comprehensive approach exists to quantify such risks. In this paper we develop a framework, based on statistical decision theory, to assess the relationship between the disclosed data and the resulting privacy risk. We relate our framework with the k-anonymity disclosure method; we make the assumptions behind k-anonymity explicit, quantify them, and extend them in several natural directions.

    View the Full Publication
  • 01/01/2006Architectural issues for a location-aware role-based access control systemMaria Luisa Damiani, Elisa Bertino

    An increasingly important category of location-based services (LBS) responding to the demands of mobility in organizations is represented by Enterprise LBS (E-LBS). E-LBS pose challenging requirements, including the need of selective access to the services based on the position of mobile users and spatially-bounded organizational functions. To address these requirements a general architectural framework for an access control system based on location-aware roles is introduced.

    View the Full Publication
  • 01/01/2006Access control, confidentiality and privacy for video surveillance databasesBhavani Thuraisingham, Gal Lavee, Elisa Bertino, Jianoping Fan, Latifur Khan

    In this paper we have addressed confidentiality and privacy for video surveillance databases. First we discussed our overall approach for suspicious event detection. Next we discussed an access control model and accedes control algorithms for confidentiality. Finally we discuss privacy preserving video surveillance. Our goal is build a comprehensive system that can detect suspicious events, ensure confidentiality as well as privacy.

    View the Full Publication
  • 01/01/2006Systematic control and management of data integrityJi-Won Byun, Yonglak Sohn, Elisa Bertino

    Integrity has long been considered a fundamental requirement for secure computerized systems, and especially today's demand for data integrity is stronger than ever as many organizations are increasing their reliance on data and information systems. A number of recently enacted data privacy regulations also require high integrity for personal data. In this paper, we discuss various issues concerning systematic control and management of data integrity with a primary focus on access control. We first examine some previously proposed integrity models and define a set of integrity requirements. We then present an architecture for comprehensive integrity control systems, which has its basis on data validation and metadata management. We also provide an integrity control policy language that we believe is flexible and intuitive.

    View the Full Publication
  • 01/01/2006XACML policy integration algorithms: not to be confused with XACML policy combination algorithms!Pietro Mazzoleni, Elisa Bertino, Bruno Crispo, Swaminathan Sivasubramanian

    XACML is the OASIS standard language for the specification of authorization and entitlement policies. However, while XACML well addresses security requirements of a single enterprise (even if large and composed by multiple departments), it does not address the requirements of virtual enterprises built through collaboration of several autonomous subjects sharing their resources. In this paper we highlight such limitations and we propose an XACML extension, the policy integration algorithm, to address them. In the paper we also discuss in which respect the process of comparing two XACML policies differs from the process used to compare other business rules.

    View the Full Publication
  • 01/01/2006Fine-grained role-based delegation in presence of the hybrid role hierarchyJames Joshi, Elisa Bertino

    Delegation of authority is an important process that needs to be captured by any access control model. In role-based access control models, delegation of authority involves delegating roles that a user can assume or the set of permissions that he can acquire, to other users. Several role-based delegation models have been proposed in the literature. However, these models consider delegation in presence of the general hierarchy type. Multiple hierarchy types have been proposed in the context of Generalized Temporal Role-based Access Control (GTRBAC) model, where it has been shown that multiple hierarchy semantics is desirable to express fine-grained access control policies. In this paper, we address role-based delegation schemes in the of hybrid hierarchies and elaborate on fine-grained delegation schemes. In particular, we show that upward delegation, which has been considered as having no practical use, is a desirable feature. Furthermore, we show that accountability must be considered as an important factor during the delegation process. The delegation framework proposed subsumes delegations schemes proposed in earlier role-based delegation models and provide much more fine-grained control of delegation semantics.

    View the Full Publication
  • 01/01/2006Secure Anonymization for Incremental DatasetsJi-Won Byun, Yonglak Sohn, Elisa Bertino, Ninghui Li

    Data anonymization techniques based on the k-anonymity model have been the focus of intense research in the last few years. Although the k-anonymity model and the related techniques provide valuable solutions to data privacy, current solutions are limited only to static data release (i.e., the entire dataset is assumed to be available at the time of release). While this may be acceptable in some applications, today we see databases continuously growing everyday and even every hour. In such dynamic environments, the current techniques may suffer from poor data quality and/or vulnerability to inference. In this paper, we analyze various inference channels that may exist in multiple anonymized datasets and discuss how to avoid such inferences. We then present an approach to securely anonymizing a continuously growing dataset in an efficient manner while assuring high data quality.

    View the Full Publication
  • 01/01/2006Access control enforcement for conversation-based web servicesMassimo Mecella, Mourad Ouzzani, Federica Paci, Elisa Bertino

    Service Oriented Computing is emerging as the main approach to build distributed enterprise applications on the Web. The widespread use of Web services is hindered by the lack of adequate security and privacy support. In this paper, we present a novel framework for enforcing access control in conversation-based Web services. Our approach takes into account the conversational nature of Web services. This is in contrast with existing approaches to access control enforcement that assume a Web service as a set of independent operations. Furthermore, our approach achieves a tradeoff between the need to protect Web service's access control policies and the need to disclose to clients the portion of access control policies related to the conversations they are interested in. This is important to avoid situations where the client cannot progress in the conversation due to the lack of required security requirements. We introduce the concept of k-trustworthiness that defines the conversations for which a client can provide credentials maximizing the likelihood that it will eventually hit a final state.

    View the Full Publication
  • 01/01/2006Controlled and cooperative updates of XML documents in byzantine and failure-prone distributed systemsGiovanni Mella, Elena Ferrari, Elisa Bertino, Yunhua Koglin

    This paper proposes an infrastructure and related algorithms for the controlled and cooperative updates of XML documents. Key components of the proposed system are a set of XML-based languages for specifying access-control policies and the path that the document must follow during its update. Such path can be fully specified before the update process begins or can be dynamically modified by properly authorized subjects while being transmitted. Our approach is fully distributed in that each party involved in the process can verify the correctness of the operations performed until that point on the document without relying on a central authority. More importantly, the recovery procedure also does not need the participation of a central authority. Our approach is based on the use of some special control information that is transmitted together with the document and a suite of protocols. We formally specify the structure of such control information and the protocols. We also analyze security and complexity of the proposed protocols.

    View the Full Publication
  • 01/01/2006Policies and IT Technologies: A Puzzle of Two PiecesElisa Bertino, Steve Ruth

    This new public policy technology track will appear in each installment of IEEE Internet Computing in 2006 and will cover a wide range of topics. The authors describe their vision of what to expect in future issues along with a call to arms to build a like-minded community

    View the Full Publication
  • 01/01/2006Achieving Privacy in Trust Negotiations with an Ontology-Based ApproachAnna Squicciarini, Elisa Bertino, Elena Ferrari, Indrakshi Ray

    The increasing use of Internet in a variety of distributed multiparty interactions and transactions with strong real-time requirements has pushed the search for solutions to the problem of attribute-based digital interactions. A promising solution today is represented by automated trust negotiation systems. Trust negotiation systems allow subjects in different security domains to securely exchange protected resources and services. These trust negotiation systems, however, by their nature, may represent a threat to privacy in that credentials, exchanged during negotiations, often contain sensitive personal information that may need to be selectively released. In this paper, we address the problem of preserving privacy in trust negotiations. We introduce the notion of privacy preserving disclosure, that is, a set that does not include attributes or credentials, or combinations of these, that may compromise privacy. To obtain privacy preserving disclosure sets, we propose two techniques based on the notions of substitution and generalization. We argue that formulating the trust negotiation requirements in terms of disclosure policies is often restrictive. To solve this problem, we show how trust negotiation requirements can be expressed as property-based policies that list the properties needed to obtain a given resource. To better address this issue, we introduce the notion of reference ontology, and formalize the notion of trust requirement. Additionally, we develop an approach to derive disclosure policies from trust requirements and formally state some semantics relationships (i.e., equivalence, stronger than) that may hold between policies. These relationships can be used by a credential requestor to reason about which disclosure policies he/she should use in a trust negotiation.

    View the Full Publication
  • 01/01/2006X-FEDERATE: A Policy Engineering Framework for Federated Access ManagementRafae Bhatti, Elisa Bertino, Arif Ghafoor

    Policy-Based Management (PBM) has been considered as a promising approach for design and enforcement of access management policies for distributed systems. The increasing shift toward federated information sharing in the organizational landscape, however, calls for revisiting current PBM approaches to satisfy the unique security requirements of the federated paradigm. This presents a twofold challenge for the design of a PBM approach, where, on the one hand, the policy must incorporate the access management needs of the individual systems, while, on the other hand, the policies across multiple systems must be designed in such a manner that they can be uniformly developed, deployed, and integrated within the federated system. In this paper, we analyze the impact of security management challenges on policy design and formulate a policy engineering methodology based on principles of software engineering to develop a PBM solution for federated systems. We present X-FEDERATE, a policy engineering framework for federated access management using an extension of the well-known Role-Based Access Control (RBAC) model. Our framework consists of an XML-based policy specification language, its UML-based meta-model, and an enforcement architecture. We provide a comparison of our framework with related approaches and highlight its significance for federated access management. The paper also presents a federation protocol and discusses a prototype of our framework that implements the protocol in a federated digital library environment.

    View the Full Publication
  • 01/01/2006A multigranular object-oriented framework supporting spatio-temporal granularity conversionsElena Camossi, Mario Bertolotti, Elisa Bertino

    Several application domains require handling spatio-temporal data. However, traditional Geographic Information Systems (GIS) and database models do not adequately support temporal aspects of spatial data. A crucial issue relates to the choice of the appropriate granularity. Unfortunately, while a formalisation of the concept of temporal granularity has been proposed and widely adopted, no consensus exists on the notion of spatial granularity. In this paper, we address these open problems, by proposing a formal definition of spatial granularity and by designing a spatio-temporal framework for the management of spatial and temporal information at different granularities. We present a spatio-temporal extension of the ODMG type system with specific types for defining multigranular spatio-temporal properties. Granularity conversion functions are introduced to obtain attributes values at different spatial and temporal granularities.

    View the Full Publication
  • 01/01/2006Micro-views, or on how to protect privacy while enhancing data usability: concepts and challengesJi-Won Byun, Elisa Bertino

    The large availability of repositories storing various types of information about individuals has raised serious privacy concerns over the past decade. Nonetheless, database technology is far from providing adequate solutions to this problem that requires a delicate balance between an individual's privacy and convenience and data usability by enterprises and organizations - a database which is rigid and over-protective may render data of little value. Though these goals may seem odd, we argue that the development of solutions able to reconcile them will be an important challenge to be addressed in the next few years. We believe that the next wave of database technology will be represented by a DBMS that provides high-assurance privacy and security. In this paper, we elaborate on such challenges. In particular, we argue that we need to provide different views of data at a very fine level of granularity; conventional view technology is able to select only up to a single attribute value for a single tuple. We need to go even beyond this level. That is, we need a mechanism by which even a single value inside a tuple's attribute may have different views; we refer them as micro-views. We believe that such a mechanism can be an important building block, together with other mechanisms and tools, of the next wave of database technology.

    View the Full Publication
  • 01/01/2006Ws-AC: A Fine Grained Access Control System for Web ServicesElisa Bertino, Anna Squicciarini, Ivan Paloscia, Lorenzo D. Martino

    The emerging Web service technology has enabled the development of Internet-based applications that integrate distributed and heterogeneous systems and processes which are owned by different organizations. However, while Web services are rapidly becoming a fundamental paradigm for the development of complex Web applications, several security issues still need to be addressed. Among the various open issues concerning security, an important issue is represented by the development of suitable access control models, able to restrict access to Web services to authorized users. In this paper we present an innovative access control model for Web services. The model is characterized by a number of key features, including identity attributes and service negotiation capabilities. We formally define the protocol for carrying on negotiations, by specifying the types of message to be exchanged and their contents, based on which requestor and provider can reach an agreement about security requirements and services. We also discuss the architecture of the prototype we are currently implementing. As part of the architecture we propose a mechanism for mapping our policies onto the WS-Policy standard which provides a standardized grammar for expressing Web services policies.

    View the Full Publication
  • 01/01/2006Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic RelationshipsEduard Dragut, Ramon Lawrence

    Defining and composing mappings are fundamental operations required in any data sharing architecture (e.g. data warehouse, data integration). Mapping composition is used to generate new mappings from existing ones and is useful when no direct mapping is available. The complexity of mapping composition depends on the amount of syntactic and semantic information in the mapping. The composition of mappings has proven to be inefficient to compute in many situations unless the mappings are simplified to binary relationships that represent “similarity” between concepts. Our contribution is an algorithm for composing metadata mappings that capture explicit semantics in terms of binary relationships. Our approach allows the hard cases of mapping composition to be detected and semi-automatically resolved, and thus reduces the manual effort required during composition. We demonstrate how the mapping composition algorithm is used to produce a direct mapping between schemas from independently produced schema-to-ontology mappings. An experimental evaluation shows that composing semantic mappings results in a more accurate composition result compared to composing mappings as morphisms.

    View the Full Publication
  • 01/01/2006Meaningful Labeling of Integrated Query InterfacesEduard Dragut, Clement Yu, Weiyi Meng

    The contents of Web databases are accessed through queries formulated on complex user interfaces. In many domains of interest (e.g. Auto) users are interested in obtaining information from alternative sources. Thus, they have to access many individual Web databases via query interfaces. We aim to construct automatically a well-designed query interface that integrates a set of interfaces in the same domain. This will permit users to access information uniformly from multiple sources. Earlier research in this area includes matching attributes across multiple query interfaces in the same domain and grouping related attributes. In this paper, we investigate the naming of the attributes in the integrated query interface. We provide a set of properties which are required in order to have consistent labels for the attributes within an integrated interface so that users have no difficulty in understanding it. Based on these properties, we design algorithms to systematically label the attributes. Experimental results on seven domains validate our theoretical study. In the process of naming attributes, a set of logical inference rules among the textual labels is discovered. These inferences are also likely to be applicable to other integration problems sensitive to naming: HTML forms, HTML tables or concept hierarchies in the semantic Web.

    View the Full Publication
  • 01/01/2006Measuring High-Performance Computing with Real ApplicationsMohamed Sayeed, Hansang Bae, Yili Zheng, Brian Armstrong, Rudolf Eigenmann, Faisal Saied

    The authors discuss the important questions that benchmarking must answer and the degree to which such answers can be given by existing kernel versus real application benchmarks. They describe the state of the art and challenges that must be met to base needed performance measurements on real applications. Finally, they quantify their claims by measuring and comparing several real applications and kernel benchmarks. An important finding is that all three measured computer platforms performed both the best and the worst across the selected applications; "best performance" significantly depended on the problem being solved, questioning the value of computer rankings that use simplistic metrics.

    View the Full Publication
  • 01/01/2006Implementing Tomorrow’s Programming LanguagesRudolf Eigenmann

    Compilers are the critical translators that convert a human-readable program into the code understood by the machine. While this transformation is already sophisticated today, tomorrow’s compilers face a tremendous challenge. There is a demand to provide languages that are much higher level than today’s C, Fortran, or Java. On the other hand, tomorrow’s machines are more complex than today; they involve multiple cores and may span the planet via compute Grids. How can we expect compilers to provide efficient implementations? I will describe a number of related research efforts that try to tackle this problem. Composition builds a way towards higher-level programming languages. Automatic translation of shared-address-space models to distributed-memory architectures may lead to higher productivity than current message passing paradigms. Advanced symbolic analysis techniques equips compilers with capabilities to reason about programs in abstract terms. Last but not least, through auto-tuning, compilers make effective decisions, even through there may be insufficient information at compile time

    View the Full Publication
  • 01/01/2006Can Transactions Enhance Parallel Programs?Troy Johnson, Sang-lk Lee, Seung-Jai Min, Rudolf Eigenmann

    Transactional programming constructs have been proposed as key elements of advanced parallel programming models. Currently, it is not well understood to what extent such constructs enable efficient parallel program implementations and ease parallel programming beyond what is possible with existing techniques. To help answer these questions, we investigate the technology underlying transactions and compare it to existing parallelization techniques. We also consider the most important parallelizing transformation techniques and look for opportunities to further improve them through transactional constructs or – vice versa – to improve transactions with these transformations. Finally, we evaluate the use of transactions in the SPEC OMP benchmarks.

    View the Full Publication
  • 01/01/2006Fast, automatic, procedure-level performance tuningZhelong Pan, Rudolf Eigenmann

    This paper presents an automated performance tuning solution, which partitions a program into a number of tuning sections and finds the best combination of compiler options for each section. Our solution builds on prior work on feedback-driven optimization, which tuned the whole program, instead of each section. Our key novel algorithm partitions a program into appropriate tuning sections. We also present the architecture of a system that automates the tuning process; it includes several pre-tuning steps that partition and instrument the program, followed by the actual tuning and the post-tuning assembly of the individually-optimized parts. Our system, called PEAK, achieves fast tuning speed by measuring a small number of invocations of each code section, instead of the whole-program execution time, as in common solutions. Compared to these solutions PEAK reduces tuning time from 2.19 hours to 5.85 minutes on average, while achieving similar program performance. PEAK improves the performance of SPEC CPU2000 FP benchmarks by 12% on average over GCC O3, the highest optimization level, on a Pentium IV machine.

    View the Full Publication
  • 01/01/2006Context-sensitive domain-independent algorithm composition and selectionTroy Johnson, Rudolf Eigenmann

    Progressing beyond the productivity of present-day languages appears to require using domain-specific knowledge. Domain-specific languages and libraries (DSLs) proliferate, but most optimizations and language features have limited portability because each language's semantics are related closely to its domain. We explain how any DSL compiler can use a domain-independent AI planner to implement algorithm composition as a language feature. Our notion of composition addresses a common DSL problem: good library designers tend to minimize redundancy by including only fundamental procedures that users must chain together into call sequences. Novice users are confounded by not knowing an appropriate sequence to achieve their goal. Composition allows the programmer to define and call an abstract algorithm (AA) like a procedure. The compiler replaces an AA call with a sequence of library calls, while considering the calling context. Because AI planners compute a sequence of operations to reach a goal state, the compiler can implement composition by analyzing the calling context to provide the planner's initial state. Nevertheless, mapping composition onto planning is not straightforward because applying planning to software requires extensions to classical planning, and procedure specifications may be incomplete when expressed in a planning language. Compositions may not be provably correct, so our approach mitigates semantic incompleteness with unobtrusive programmer-compiler interaction. This tradeoff is key to making composition a practical and natural feature of otherwise imperative languages, whose users eschew complex logical specifications. Compositions satisfying an AA may not be equal in performance, memory usage, or precision and require selection of a preferred solution. We examine language design and implementation issues, and we perform a case study on the BioPerl bioinformatics library.

    View the Full Publication
  • 01/01/2006Optimizing irregular shared-memory applications for distributed-memory systemsAyon Basumallik, Rudolf Eigenmann

    In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this translation scheme is limited by the fact that accesses to shared-data cannot be accurately resolved at compile-time. Additionally, irregular applications with high communication to computation ratios pose challenges even for direct implementation on message passing systems. In this paper, we present combined compile-time/run-time techniques for optimizing irregular shared-memory applications on message passing systems in the context of automatic translation from OpenMP to MPI. Our transformations enable computation-communication overlap by restructuring irregular parallel loops. The compiler creates inspectors to analyze actual data access patterns for irregular accesses at runtime. This analysis is combined with the compile-time analysis of regular data accesses to determine which iterations of irregular loops access non-local data. The iterations are then reordered to enable computation-communication overlap. In the case where the irregular access occurs inside nested loops, the loop nest is restructured. We evaluate our techniques by translating OpenMP versions of three benchmarks from two important classes of irregular applications - sparse matrix computations and molecular dynamics. We find that for these applications, on sixteen nodes, versions employing computation-communication overlap are almost twice as fast as baseline OpenMP-to-MPI versions, almost 30% faster than inspector-only versions, almost 25% faster than hand-coded versions on two applications and about 9% slower on the third.

    View the Full Publication
  • 01/01/2006Exploiting reference idempotency to reduce speculative storage overflowSeon Wook Kim, Chong-Liang Ooi, Rudolf Eigenmann, Babak Falsafi, T.N. Vijaykumar

    Recent proposals for multithreaded architectures employ speculative execution to allow threads with unknown dependences to execute speculatively in parallel. The architectures use hardware speculative storage to buffer speculative data, track data dependences and correct incorrect executions through roll-backs. Because all memory references access the speculative storage, current proposals implement speculative storage using small memory structures to achieve fast access. The limited capacity of the speculative storage causes considerable performance loss due to speculative storage overflow whenever a thread's speculative state exceeds the speculative storage capacity. Larger threads exacerbate the overflow problem but are preferable to smaller threads, as larger threads uncover more parallelism.In this article, we discover a new program property called memory reference idempotency. Idempotent references are guaranteed to be eventually corrected, though the references may be temporarily incorrect in the process of speculation. Therefore, idempotent references, even from nonparallelizable program sections, need not be tracked in the speculative storage, and instead can directly access nonspeculative storage (i.e., conventional memory hierarchy). Thus, we reduce the demand for speculative storage space in large threads. We define a formal framework for reference idempotency and present a novel compiler-assisted speculative execution model. We prove the necessary and sufficient conditions for reference idempotency using our model. We present a compiler algorithm to label idempotent memory references for the hardware. Experimental results show that for our benchmarks, over 60% of the references in nonparallelizable program sections are idempotent.

    View the Full Publication
  • 01/01/2006Policy-based security management for federated healthcare databases (or RHIOs)Rafae Bhatti, Khalid Moidu, Arif Ghafoor

    The role of security management in the RHIOs has recently gained increasing attention due to strict privacy and disclosure rules, and federal regulations such as HIPAA. The envisioned use of electronic health care records in such systems involves pervasive and ubiquitous access to healthcare information from anywhere outside of traditional hospital boundaries which puts increasing demands on the underlying security mechanisms. In this paper, we have designed a context-aware policy-based system to provide security management for health informatics. The policies are based on a set of use cases developed for the HL7 Clinical Document Architecture (CDA) standard. Our system is designed to adapt well to ubiquitous healthcare services in a non-traditional, pervasive environment using the same infrastructure that enables federated healthcare management for traditional organizational boundaries. We also present an enforcement architecture and a demonstration prototype for the policy-based system proposed in this paper.

    View the Full Publication
  • 01/01/2006Asemi-static approach to mapping dynamic iterative tasks ontoYu-Kwong Kwok, Anthony Maciejewski, Howard Siegel, Ishfaq Ahmad

    Minimization of the execution time of an iterative application in a heterogeneous parallel computing environment requires an appropriate mapping scheme for matching and scheduling the subtasks of a given application onto the processors. Often, some of the characteristics of the application subtasks are unknown a priori or change from iteration to iteration during execution-time based on the inputs being processed. In such a scenario, it may not be feasible to use the same off-line-derived mapping for each iteration of the application. One possibility is to employ a semi-static methodology that starts with an initial mapping but dynamically performs remapping between application iterations by observing the effects of the changing characteristics of the application's input data, called dynamic parameters, on the application's execution time. A contribution in this paper is to implement and evaluate a semi-static methodology involving the on-line use of off-line-derived mappings. The off-line phase is based on a genetic algorithm (GA) to generate high-quality mappings for a range of values for the dynamic parameters. A dynamic parameter space partitioning and sampling scheme is proposed that partitions the parameter space into a number of hyper-rectangles, within which the “best” mapping for each hyper-rectangle is stored in a mapping table. During the on-line phase, the actual dynamic parameters are observed and the off-line-derived mapping table is referenced to choose the most suitable mapping. Experimental results indicate that the semi-static approach outperforms a dynamic on-line approach and performs reasonably close to an infeasible on-line GA approach. Furthermore, the semi-static approach considerably outperforms the method of using the same mapping for all iterations.

    View the Full Publication
  • 01/01/2006Guest Editors' introduction to the special issue: machine learning approaches to multimedia information retrievalArif Ghafoor, Zhongfei Zhang, Michael Lew, Zhi-Hua Zhou

    View the Full Publication
  • 01/01/2006Challenges in spatiotemporal stream query optimizationHicham Elmongui, Mourad Ouzzani, Walid G. Aref

    Simplified technology and low costs have spurred the use of location-detection devices in moving objects. Usually, these devices will send the moving objects' location information to a spatio-temporal data stream management system, which will be then responsible for answering spatio-temporal queries related to these moving objects. A large spectrum of research have been devoted to continuous spatio-temporal query processing. However, we argue that several outstanding challenges have been either addressed partially or not at all in the existing literature. In particular, in this paper, we focus on the optimization of multi-predicate spatio-temporal queries on moving objects. We present several major challenges related to the lack of spatio-temporal pipelined operators, and the impact of time, space, and their combination on the query plan optimality under different circumstances mof query and object distributions. We show that building an adaptive query optimization framework is key in addressing these challenges and coping with the dynamic nature of the environment we are evolving in.

    View the Full Publication
  • 01/01/2006Natural Variants of AtHKT1 Enhance Na+ Accumulation in Two Wild Populations of ArabidopsisAna Rus, Ivan Baxter, Balasubramaniam Muthukumar, Jeff Gustin, Brett Lahner, Elena Yakubova, David Salt

    plants are sessile and therefore have developed mechanisms to adapt to their environment, including the soil mineral nutrient composition. Ionomics is a developing functional genomic strategy designed to rapidly identify the genes and gene networks involved in regulating how plants acquire and accumulate these mineral nutrients from the soil. Here, we report on the coupling of high-throughput elemental profiling of shoot tissue from various Arabidopsis accessions with DNA microarray-based bulk segregant analysis and reverse genetics, for the rapid identification of genes from wild populations of Arabidopsis that are involved in regulating how plants acquire and accumulate Na+ from the soil. Elemental profiling of shoot tissue from 12 different Arabidopsis accessions revealed that two coastal populations of Arabidopsis collected from Tossa del Mar, Spain, and Tsu, Japan (Ts-1 and Tsu-1, respectively), accumulate higher shoot levels of Na+ than do Col-0 and other accessions. We identify AtHKT1, known to encode a Na+ transporter, as being the causal locus driving elevated shoot Na+ in both Ts-1 and Tsu-1. Furthermore, we establish that a deletion in a tandem repeat sequence approximately 5 kb upstream of AtHKT1 is responsible for the reduced root expression of AtHKT1 observed in these accessions. Reciprocal grafting experiments establish that this loss of AtHKT1 expression in roots is responsible for elevated shoot Na+. Interestingly, and in contrast to the hkt1–1 null mutant, under NaCl stress conditions, this novel AtHKT1 allele not only does not confer NaCl sensitivity but also cosegregates with elevated NaCl tolerance. We also present all our elemental profiling data in a new open access ionomics database, the Purdue Ionomics Information Management System (PiiMS; http://www.purdue.edu/dp/ionomics). Using DNA microarray-based genotyping has allowed us to rapidly identify AtHKT1 as the casual locus driving the natural variation in shoot Na+ accumulation we observed in Ts-1 and Tsu-1. Such an approach overcomes the limitations imposed by a lack of established genetic markers in most Arabidopsis accessions and opens up a vast and tractable source of natural variation for the identification of gene function not only in ionomics but also in many other biological processes.

    View the Full Publication
  • 01/01/2006SIZ1 Small Ubiquitin-Like Modifier E3 Ligase Facilitates Basal Thermotolerance in Arabidopsis Independent of Salicylic AcidChan Yul Yoo, Kenji Miura, Jing Bo Jin, Jiyoung Lee, Hyeong Park, David Salt, Dae-Jin Yun, Ray Bressan, Paul Hasegawa

    Small ubiquitin-like modifier (SUMO) conjugation/deconjugation to heat shock transcription factors regulates DNA binding of the peptides and activation of heat shock protein gene expression that modulates thermal adaptation in metazoans. SIZ1 is a SUMO E3 ligase that facilitates SUMO conjugation to substrate target proteins (sumoylation) in Arabidopsis (Arabidopsis thaliana). siz1 T-DNA insertional mutations (siz1-2 and siz1-3; Miura et al., 2005) cause basal, but not acquired, thermosensitivity that occurs in conjunction with hyperaccumulation of salicylic acid (SA). NahG encodes a salicylate hydroxylase, and expression in siz1-2 seedlings reduces endogenous SA accumulation to that of wild-type levels and further increases thermosensitivity. High temperature induces SUMO1/2 conjugation to peptides in wild type but to a substantially lesser degree in siz1 mutants. However, heat shock-induced expression of genes, including heat shock proteins, ascorbate peroxidase 1 and 2, is similar in siz1 and wild-type seedlings. Together, these results indicate that SIZ1 and, by inference, sumoylation facilitate basal thermotolerance through processes that are SA independent.

    View the Full Publication
  • 01/01/2006Mutations in Arabidopsis Yellow Stripe-Like1 and Yellow Stripe-Like3 Reveal Their Roles in Metal Ion Homeostasis and Loading of Metal Ions in SeedsBrian Waters, Heng-Hsuan Chu, Raymond DiDonato, Louis Roberts, Robynn Eisley, Brett Lahner, David Salt, Elsbeth Walker

    Here, we describe two members of the Arabidopsis (Arabidopsis thaliana) Yellow Stripe-Like (YSL) family, AtYSL1 and AtYSL3. The YSL1 and YSL3 proteins are members of the oligopeptide transporter family and are predicted to be integral membrane proteins. YSL1 and YSL3 are similar to the maize (Zea mays) YS1 phytosiderophore transporter (ZmYS1) and the AtYSL2 iron (Fe)-nicotianamine transporter, and are predicted to transport metal-nicotianamine complexes into cells. YSL1 and YSL3 mRNAs are expressed in both root and shoot tissues, and both are regulated in response to the Fe status of the plant. -Glucuronidase reporter expression, driven by YSL1 and YSL3 promoters, reveals expression patterns of the genes in roots, leaves, and flowers. Expression was highest in senescing rosette leaves and cauline leaves. Whereas the single mutants ysl1 and ysl3 had no visible phenotypes, the ysl1ysl3 double mutant exhibited Fe deficiency symptoms, such as interveinal chlorosis. Leaf Fe concentrations are decreased in the double mutant, whereas manganese, zinc, and especially copper concentrations are elevated. In seeds of double-mutant plants, the concentrations of Fe, zinc, and copper are low. Mobilization of metals from leaves during senescence is impaired in the double mutant. In addition, the double mutant has reduced fertility due to defective anther and embryo development. The proposed physiological roles for YSL1 and YSL3 are in delivery of metal micronutrients to and from vascular tissues.

    View the Full Publication
  • 01/01/2006Identifying model metal hyperaccumulating plants: germplasm analysis of 20 Brassicaceae accessions from a wide geographical areaWendy Peer, Mehrzad Mamoudian, Brett Lahner, Roger Reeves, Angus Murphy, David Salt

    Here we report on the first phase of a funded programme to select a wild relative of Arabidopsis thaliana for use in large-scale genomic strategies, including forward and reverse genetic screens for the identification of genes involved in metal hyperaccumulation.

  • Twenty accessions of metal accumulating species of the Brassicaceae collected from Austria, France, Turkey and the USA during spring–summer 2001 were evaluated.
  • The criteria established for selection were: hyperaccumulation of metal (Ni, Zn); compact growth habit; reasonable time to flowering; production of ≥ 1000 seeds per plant; self-fertility; a compact diploid genome; high sequence identity with A. thaliana; and ≥ 0.1% transformation efficiency with easy selection. As part of this selection process we also report, for the first time, the stable genetic transformation of various hyperaccumulator species with both the green fluorescence protein (GFP) and the bar selectable marker.
  • We conclude that metal hyperaccumulation ability, self-fertility, seed set, transformation efficiency and a diploid genome were the most important selection criteria. Based on an overall assessment of the performance of all 20 accessions, Thlaspi caerulescens Félix de Pallières showed the most promise as a model hyperaccumulator
  • View the Full Publication
  • 01/01/2006An Arabidopsis Basic Helix-Loop-Helix Leucine Zipper Protein Modulates Metal Homeostasis and Auxin Conjugate ResponsivenessRebekah Rampey, Andrew Woodward, Brianne Hobbs, Megan Tierney, Brett Lahner, David Salt, Bonnie Bartel

    The plant hormone auxin can be regulated by formation and hydrolysis of amide-linked indole-3-acetic acid (IAA) conjugates. Here, we report the characterization of the dominant Arabidopsis iaaleucine resistant3 (ilr3-1) mutant, which has reduced sensitivity to IAA–Leu and IAA–Phe, while retaining wild-type responses to free IAA. The gene defective in ilr3-1 encodes a basic helix-loop-helix leucine zipper protein, bHLH105, and the ilr3-1 lesion results in a truncated product. Overexpressing ilr3-1 in wild-type plants recapitulates certain ilr3-1 mutant phenotypes. In contrast, the loss-of-function ilr3-2 allele has increased IAA–Leu sensitivity compared to wild type, indicating that the ilr3-1 allele confers a gain of function. Microarray and quantitative real-time PCR analyses revealed five downregulated genes in ilr3-1, including three encoding putative membrane proteins similar to the yeast iron and manganese transporter Ccc1p. Transcript changes are accompanied by reciprocally misregulated metal accumulation in ilr3-1 and ilr3-2 mutants. Further, ilr3-1 seedlings are less sensitive than wild type to manganese, and auxin conjugate response phenotypes are dependent on exogenous metal concentration in ilr3 mutants. These data suggest a model in which the ILR3/bHLH105 transcription factor regulates expression of metal transporter genes, perhaps indirectly modulating IAA-conjugate hydrolysis by controlling the availability of metals previously shown to influence IAA–amino acid hydrolase protein activity

    View the Full Publication
  • 11/18/2004Bicistronic and fused monocistronic transcripts are derived from adjacent loci in the Arabidopsis genomeJyothi Thimmapuram, Hui Duan, Lei Liu, Mary A. Schuler

    Comparisons of full-length cDNAs and genomic DNAs available for Arabidopsis thaliana described here indicate that some adjacent loci are transcribed into extremely long RNAs spanning two annotated genes. Once expressed, some of these transcripts are post-transcriptionally spliced within their coding and intergenic sequences to generate bicistronic transcripts containing two complete open reading frames. Others are spliced to generate monocistronic transcripts coding for fusion proteins with sequences derived from both loci. RT-PCR of several P450 transcripts in this collection indicates that these extended transcripts exist side by side with shorter monocistronic transcripts derived from the individual loci in each pair. The existence of these unusual transcripts highlights variations in the processes of transcription and splicing that could not possibly have been predicted in the algorithms used for genome annotation and splice site predictions.

    View the Full Publication

About Us

The Cyber Center at Purdue University will provide a venue for all IT-related research, hardware, software, and staffing to come together in a single venue allowing new discoveries that can have immediate impact on discovery, learning, and engagement.


Contact

Cyber Center
Young Hall
155 South Grant Street
West Lafayette, Indiana 47907