Presentation on theme: "A View Based Security Framework for XML Wenfei Fan, Irini Fundulaki, Floris Geerts, Xibei Jia, Anastasios Kementsietsidis University of Edinburgh Digital."— Presentation transcript:
A View Based Security Framework for XML Wenfei Fan, Irini Fundulaki, Floris Geerts, Xibei Jia, Anastasios Kementsietsidis University of Edinburgh Digital Curation Center
Introduction XML data management The importance is clearly demonstrated by the wide adoption of XML related technologies in eScience projects Selective exposure of information in XML a primary concern for data providers, curators and consumers. safeguard data confidentiality, privacy and intellectual property
Introduction --- Security View Security View: multiple user groups who wish to query the same XML document different access policies may be imposed, specifying the portions of the document the users are granted or denied access to. Security views are necessarily virtual it is prohibitively expensive to materialize and maintain a large number of views.
Example: a medical records XML database The security admin could see the whole db Hospital Patient Doctor Record Diagnosis Date Name Genetics Psychiatry Record Doctor Date Name Diagnosis Name Patient Name Record Doctor Date Diagnosis Name Patient Name 'David' 'Mary' 'Angela' 'Mark'' Bill Sex Doctor David can only access the records of his patients Patient Mary can access his own medical records
Insurers view An insurer can only read his customers' billing info
Researchers View a medical researcher could retrieve the diagnosis data for research purposes, but not the information on doctors or patients.
System Architecture XML document T View Derivation Security Specification S Query Rewriting Query Evaluation Query Optimization Security View V R for Role U R with XSD D R Query Q R on V R Query Q T on T D R... Security View V D for Role U D with XSD D D Security View V P for Role U P with XSD D P security spec. lang. L S used by admins. view spec. lang. L V transparent to users. view query lang. L QV used by users. doc query lang. L QR transparent to users. legend Indexer Query Editor Security Spec. Editor Result Viewer security admins researchers input module output module core module optional module virtual view XML schema XSD D for document T XML data flow other data flow XML database
Security Specification hospital -> patient* (hospital,patient) = [visit/treatment/medication = autism] patient -> pname, visit*, parent* (patient,pname) = N (patient,visit) = N parent -> patient visit -> treatment, date (visit, treatment) = [medication] treatment -> test + medication (treatment,test) = N
Security Specification Classify the nodes in the XML document accessible nodes inaccessible nodes conditional accessible nodes Support inheritance overriding content-based access privilege context-dependency View derivation module schema availability the availability of an XML schema that specifies the structure of accessible data is critical to the users who can then formulate queries only over this schema.
Query Over the View Regular XPath Query a mild extension of XPath that supports the general Kleene closure (.)* instead of the limited recursion //. Why: XPath is not closed under query rewriting i.e. for an XPath query on a recursively defined view there may not exist an equivalent XPath query on the underlying document
Query Over the Document Regular XPath Query However, the size of the rewritten query Q T, if directly represented in Regualar XPath, may be exponential in the size of input query Q V. We overcome this challenge by employing an automaton characterization of Q T, denoted by MFA(mixed finite state automata), which is linear in the size of Q V. Query Rewriting Module
MFA: Internal Query Representation hospital/patient[(parent/patient)*/visit/treatment/test and visit/treatment[medication/text()=headache]]/pname
Query Evaluation: HyPE We propose a novel algorithm, HyPE (Hybrid Pass Evaluation), for processing Regular XPath queries represented by MFAs. A unique feature of HyPE is that it needs only a single top-down depth-first traversal of the XML tree, during which HyPE both evaluates predicates of the input query (equivalently, AFA's of the MFA) and identifies potential answer nodes (by evaluating the NFA of the MFA). previous systems require to traverse the XML document at least twice to evaluate XPath queries.
HyPE: Cans (candidate answers) The potential answer nodes are collected and stored in an auxiliary structure, referred to as Cans (candidate answers), which is often much smaller than the XML document tree. A pass over Cans is needed to retrieve the real result nodes.
SMOQE: A Reference Implementation We have developed a reference implementation, called SMOQE(Secure MOdular Query Engine), for the security framework we proposed in this paper. It is implemented in Java. demonstrated in VLDB 2006
Conclusion A generic, flexible view based access control framework for protecting XML data and its implementation: SMOQE able to enforce fine-grained access policies according to the structure and values of the protected XML data schema availability view derivation efficient enforcement of security constraints during XML query evaluation Query rewriting Automaton based representation Evaluation using HyPE and optimization