Facilitating Document Annotation using Content and Querying Value.

Slides:



Advertisements
Similar presentations
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Advertisements

On the Node Clone Detection inWireless Sensor Networks.
Optimizing Cloud Resources for Delivering IPTV Services Through Virtualization.
Annotating Search Results from Web Databases. Abstract An increasing number of databases have become web accessible through HTML form-based search interfaces.
Back-Pressure-Based Packet-by-Packet Adaptive Routing in Communication Networks.
Personalized QoS-Aware Web Service Recommendation and Visualization.
Abstract Provable data possession (PDP) is a probabilistic proof technique for cloud service providers (CSPs) to prove the clients' data integrity without.
WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream.
Discovering Emerging Topics in Social Streams via Link Anomaly Detection.
IP-Geolocation Mapping for Moderately Connected Internet Regions.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Secure Encounter-based Mobile Social Networks: Requirements, Designs, and Tradeoffs.
NICE :Network Intrusion Detection and Countermeasure Selection in Virtual Network Systems.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
Fast Nearest Neighbor Search with Keywords. Abstract Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions.
Security Evaluation of Pattern Classifiers under Attack.
A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data.
BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform.
Improving Network I/O Virtualization for Cloud Computing.
Privacy Preserving Data Sharing With Anonymous ID Assignment
m-Privacy for Collaborative Data Publishing
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development.
A Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data.
Optimal Client-Server Assignment for Internet Distributed Systems.
Protecting Sensitive Labels in Social Network Data Anonymization.
Identity-Based Secure Distributed Data Storage Schemes.
Hiding in the Mobile Crowd: Location Privacy through Collaboration.
LARS*: An Efficient and Scalable Location-Aware Recommender System.
Cooperative Caching for Efficient Data Access in Disruption Tolerant Networks.
Anonymization of Centralized and Distributed Social Networks by Sequential Clustering.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Content Sharing over Smartphone-Based Delay- Tolerant Networks.
Abstract Link error and malicious packet dropping are two sources for packet losses in multi-hop wireless ad hoc network. In this paper, while observing.
A System for Denial-of- Service Attack Detection Based on Multivariate Correlation Analysis.
Privacy Preserving Delegated Access Control in Public Clouds.
Scalable Distributed Service Integrity Attestation for Software-as-a-Service Clouds.
Anomaly Detection via Online Over-Sampling Principal Component Analysis.
A Method for Mining Infrequent Causal Associations and Its Application in Finding Adverse Drug Reaction Signal Pairs.
Keyword Query Routing.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
Abstract With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial.
Privacy Preserving Back- Propagation Neural Network Learning Made Practical with Cloud Computing.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
Participatory Privacy: Enabling Privacy in Participatory Sensing
Preventing Private Information Inference Attacks on Social Networks.
Video Dissemination over Hybrid Cellular and Ad Hoc Networks.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Abstract We propose two novel energy-aware routing algorithms for wireless ad hoc networks, called reliable minimum energy cost routing (RMECR) and reliable.
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency in Wireless Mobile Networks.
Supporting Privacy Protection in Personalized Web Search.
Twitsper: Tweeting Privately. Abstract Although online social networks provide some form of privacy controls to protect a user's shared content from other.
m-Privacy for Collaborative Data Publishing
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.
Multiparty Access Control for Online Social Networks : Model and Mechanisms.
A New Algorithm for Inferring User Search Goals with Feedback Sessions.
Data Mining with Big Data. Abstract Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development.
Harnessing the Cloud for Securely Outsourcing Large- Scale Systems of Linear Equations.
Securing Broker-Less Publish/Subscribe Systems Using Identity-Based Encryption.
Dealing With Concept Drifts in Process Mining. Abstract Although most business processes change over time, contemporary process mining techniques tend.
Privacy-Enhanced Web Service Composition. Abstract Data as a Service (DaaS) builds on service-oriented technologies to enable fast access to data resources.
Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Distributed Processing of Probabilistic Top-k Queries in Wireless Sensor Networks.
Load Rebalancing for Distributed File Systems in Clouds.
Facilitating Document Annotation Using Content and Querying Value.
Fast Transmission to Remote Cooperative Groups: A New Key Management Paradigm.
Dynamic Query Forms for Database Queries. Abstract Modern scientific databases and web databases maintain large and heterogeneous data. These real-world.
Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.
Presentation transcript:

Facilitating Document Annotation using Content and Querying Value

Abstract A large number of organizations today generate and share textual descriptions of their products, services, and actions. Such collections of textual data contain significant amount of struc¬tured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, es¬pecially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain informa¬tion of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that hu¬mans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier for humans (and/or algorithms) to identify the metadata when such information actually exists in the document, instead of naively prompting users to fill in forms with information that is not available in the document.

Abstract con… As a major contribution of this paper, we present algorithms that identify structured attributes that are likely to appear within the document, by jointly utilizing the content of the text and the query workload. Our experimental evaluation shows that our approach generates superior results compared to approaches that rely only on the textual content or only on the query workload, to identify attributes of interest.

Existing system There are many application domains where users create and share information; for instance, news blogs, scientific net¬works, social networking groups, or disaster management networks. Current information sharing tools, like content management software (e.g., Microsoft SharePoint), allow users to share documents and annotate ( tag) them in an ad-hoc way. Similarly, Google Base [1] allows users to define attributes for their objects or choose from predefined templates. This annotation process can facilitate subsequent information discovery. Many annotation systems allow only “untyped” keyword annotation: for instance, a user may annotate a weather report using a tag such as “Storm Category 3”.

Architecture Diagram

System specification HARDWARE REQUIREMENTS Processor : intel Pentium IV Ram : 512 MB Hard Disk : 80 GB HDD SOFTWARE REQUIREMENTS Operating System : windows XP / Windows 7 FrontEnd : Java BackEnd : MySQL 5

CONCLUSION We proposed adaptive techniques to suggest relevant at¬tributes to annotate a document, while trying to satisfy the user querying needs. Our solution is based on a probabilistic framework that considers the evidence in the document content and the query workload. We present two ways to combine these two pieces of evidence, content value and querying value: a model that considers both components conditionally independent and a linear weighted model. Ex¬periments shows that using our techniques, we can suggest attributes that improve the visibility of the documents with respect to the query workload by up to 50%. That is, we show that using the query workload can greatly improve the annotation process and increase the utility of shared data.

THANK YOU