Presentation is loading. Please wait.

Presentation is loading. Please wait.

KOM, SEKE, June 20, 2004 Representing Chains of Custody Along a Forensic Process: A Case Study on Kruse Model Tamer Fares Gayed, UQAM Hakim Lounis, UQAM.

Similar presentations

Presentation on theme: "KOM, SEKE, June 20, 2004 Representing Chains of Custody Along a Forensic Process: A Case Study on Kruse Model Tamer Fares Gayed, UQAM Hakim Lounis, UQAM."— Presentation transcript:

1 KOM, SEKE, June 20, 2004 Representing Chains of Custody Along a Forensic Process: A Case Study on Kruse Model Tamer Fares Gayed, UQAM Hakim Lounis, UQAM Moncef Bari, UQAM In this presentation, i will talk about AI applied to hydro operations. This work was conducted by a group of person from UQAM, Alcan and CRIM, including students.

2 Presentation outline Introduction Problem definition
KOM, SEKE, June 20, 2004 Presentation outline Introduction Problem definition Why Linked Data for representing chain of custody Solution framework Conclusions and perspectives

3 Introduction The semantic web is the web of data
KOM, SEKE, June 20, 2004 Introduction The semantic web is the web of data Tim Berners Lee outlined a set of rules for publishing data on the web: Use URI’s as names for things Use HTTP URI’s to enable people to look up those names Provide useful RDF information related to URI’s that are looked up by machines or people Include RDF statements that link to other URIs Publishing data in a structured way can facilitate its consumption and helps the consumer to take the proper decision. Introduction to linked data principles, chain of custody and forencis processes Linked data principles : Everything is addressed using unique URI’s All the URI’s are accessible via HTTP interfaces The URI’s refer to objects that are described by machine interpretable data The URIs are linked to other URI’s

4 KOM, SEKE, June 20, 2004 Introduction: CoC? CoC is chronological document that accompanies all digital evidence in order to avoid later allegations of tampering with such evidences A forensic process contains a set of phases, each phase has its own CoC document Each CoC answers 5Ws and 1H questions Each CoC for a forensic process is not the same for each forensic process The 5 Ws are the when , when, who, why and what and the 1 H is the How Phase1 Phase2 Phase x CoC1 CoC2 CoCx

5 Introduction: forensic process
KOM, SEKE, June 20, 2004 Introduction: forensic process The most common forensic process is the Kruse model: it includes the three essential steps required by any cyber forensic investigation. The 3 phases are: acquisition, authentication, and analysis Acquisition Authentication Analysis Acquisition: it is the operation of acquiring the evidence from suspect storage devices (e.g. hard disk, flash memory, digital camera). It starts by saving the state of the digital system under question so that it can be later analyzed. First responder is the role player of this activity. He is responsible to preserve the exact state that it was found [18]. Actually, the forensic analysis is not done directly on the suspect’s device but on a copy instead. Thus, after preserving the scene’ state, the role player performs two tasks: recovering and copying. Before copying the digital data from the suspected storage device to a trusted device, the deleted contents should be restored first. Later, copying the data from the suspect’s device to another device (trusted) is performed to prevent tampering and alteration of the suspect’s data on the digital device. Authentication: it is the process of ensuring that the acquired evidence has not been altered and kept its integrity since the time it was extracted, to the time it was transmitted, and stored by an authorized source [27]. Any change to the evidence will render the evidence inadmissible in the court. Investigators authenticate the digital media by generating a checksum (Hash) of it contents (i.e., using the MD5, SHA, and CRC algorithms). Checksum is like an electronic fingerprint in that it is almost impossible for two digital media with different data to have the same checksums. The main aim behind this task is showing that the checksums of the seized media (suspected) and the trusted (image) are identical. Analysis: This is the last and most time consuming step in this model. In this phase, the investigator tries to uncover the wrongdoing of the crime by examining the acquired data such as files and directories in order to identify pieces of evidence and determine their significance and probative value and drawing conclusion based on the evidence found. In [66] the author defined the 3 major categories of evidence that should be considered in the analysis phase: Inculpatory evidence: evidence that supports a given theory Exculpatory evidence: evidence that contradicts a given theory Evidence of tampering: evidence that is used to tamper the system to avoid the correct identification Analysis of evidences must be accomplished without tainting the integration of the data. Who When Why Where What How Who When Why Where What How Who When Why Where What How CoCAcqui CoCAuth CoCAnaly

6 KOM, SEKE, June 20, 2004 Problem definition CF is a growing field that requires the accommodation with the digital technologies : Semantic web standards (RDF, URL, SPARQL) are fertile land for representing the CoCs Judges’ awareness and understanding the digital evidences are not enough to evaluate and take the proper decision about the digital evidence : Representation using LDP, allows the dereferenceability of the represented resources + execution of queries. CoCs should be managed only by the authorized people and its integrity should be maintained throughout the investigation process A security mechanism should be integrated with the represented data to keep its integrity and control its access tangible CoCs and all their contents (victim information and forensic information) must also undergo a radical transformation from paper to machine readable format in order to accommodate this continuous evolution. 2. Juries need to know more about the digital evidences One of the proposed solutions is to organize a syllabus and training program to educate the juries the field of the Information and Communication Technology (ICT). The authors propose a solution offering the ability to the juries to navigate, discover (dereference) and execute different queries on the represented information. 3. A security mechanism should be integrated with the represented data to keep its integrity and limit and control its access

7 Why LDP for representing CoCs?
KOM, SEKE, June 20, 2004 Why LDP for representing CoCs? CoC and LDP are metaphors for each others; interlinking between entities Interpretation of terms and resources Inference capabilities (human or automated) Semantic vocabularies: mixture (schema) for representing forensics data Provenance metadata: to describe the provenance and complement missing answers about forensics data Knowledge representation, definition (reuse) of concepts, collaboration between different role players. The nature of CoC is characterized by interrelation/dependency of information between different phases of the forensics process. Each phase can lead to another one. This interrelation fact is the basic idea over which the linked data is published, discoverable, and significantly navigated using RDF links. RDF links in LDP will not be used only to relate the different forensic phase together, but it can also assert connection between the entities described in each forensic phase. Also, RDF typed links enable the data publisher (role player) to state explicitly the nature of connection between different entities in different and also same phases, which is not the case with the un-typed hyperlinks used in HTML. 2. Linked data enables links to be set between items/entities in different data sources using common data model (RDF) and web standards (HTTP, URI, and URL). As well, if the CoC is represented using the LDP, the items/entities in different phases can be also linked together in forensic process. This will generate a space over which different generic applications can be implemented: Browsing applications: enable juries to view data from one phase and then follow RDF links within the data to other phases in the forensic process. Search engines: juries can crawl the different phases of the forensic process and provide sophisticated queries. 3. This can be realized using two methodologies. First, by making the URIs that identify vocabulary terms dereferenceable (i.e., it means that HTTP clients can look up the URI using the HTTP protocol and retrieve a description of the resource that is identified by the URI) so that the client applications can look up the terms, which are defined using the RDFS and OWL. Secondly, by publishing mappings between terms from different vocabularies in the form of RDF links. So, for any new terms definitions, the consumption applications are able to provide and retrieve for the juries extra information describing the provided data. 4. Nowadays, RDFS and OWL are partially adopted on the web of data. Both are used to provide vocabularies for describing conceptual models in terms of classes and their properties (definition of proprietary terms). This option is useful for juries to infer more information from the data in hand using different reasoning engines 5. LD will be enriched by the vocabularies of the semantic web such as Dublin Core (DC) , Friend of a Friend (FOAF) , and Semantic Web Publishing . Also, vocabulary links is one type of RDF links that can be used to point from data to the definitions of the vocabulary terms, which are used to represent the data, as well as from these definitions of related terms into other vocabularies. This mixture is called schema in the linked data; it is a mixture of distinct terms from different vocabularies to publish the data in question. This mixture may include terms from widely used vocabularies as well as proprietary terms. Thus, we can have several vocabulary terms to represent the forensics data and make it self descriptive (using the two methodologies mentioned in point 3) and enable linked data applications to integrate the data across vocabularies and enrich the data being published 6. Provenance metadata can also be published and consumed on the web of data [6]. Such metadata provide also an answer to six questions, but on the level of the data origin (i.e., Who published/created the data, Where this data is initially published/created, What is the published data, When/Why the data is published, and How the data is published). These vocabularies can be used concurrently with the forensics data, to describe their provenance and complement the missing answers related to the forensics investigation. 7. See rapport section 3.2.1 8. . Linked data try to avoid heterogeneity by advocating the reuse of terms from widely deployed vocabularies (same agreement of ontology). 9. investigation process is a common task between different players. The descriptions of the same resource provided by different players allow different views, perspective, and opinions to be expressed.

8 KOM, SEKE, June 20, 2004 Solution Framework

9 Semantic Web Vocabularies
KOM, SEKE, June 20, 2004 Semantic Web Vocabularies Built in vocabularies RDFS, OWL, DC, FOAF,..etc Custom Vocabularies Created to describe particular domain When the built in vocabularies do not provide all terms that are needed to describe content of a data set Creating such vocabularies using lightweight ontology

10 Ex. : Definition of terms
KOM, SEKE, June 20, 2004 Ex. : Definition of terms The Who question authentication#investigated authentication#Authentication Class of all investigation The Authentication Phase The Class of all authentication tasks Definition of light weight ontology : use the full advantages of semantic web technologies, minimum OWL constructs, and reuse existing RDF vocabularies wherever possible. This figures shows an example of how a CoC’ term was defined using lightweight ontologies. The “first-responder” term is defined as a property term (rdf: type and owl#objectProperty) and its range (rdfs:range) is the First- responder class (rdf: type and rdf-schema#class) which is a subclass (rdf- schema#subClassOf) of the Person class (foaf:person). This term is also the inverse (owl:inverseof) of the “responded” verb which has the domain (rdfs:domain) of First-responder class and range (rdfs:range) of the Acquisition class.

11 Victim and Forensic Part
KOM, SEKE, June 20, 2004 Victim and Forensic Part This layer describes the mechanism of how the resources of victim and forensic parts are represented: 303 URIs, Hash URIs There exist different ways to describe any concept URI identifying the concept itself URI identifying the RDF/XML document describing the concept URI identifying the HTML document describing the concept Forensic Format can also be represented in the same unified framework (AFF4 : an open format for the storage and processing of digital evidences + representing forensics data in the form of RDF triples) 303 URIs : (known as 303 redirect): server used to redirect the client request to see another URI of a web document, which describes the concept in question. Hash URIs : to avoid two http requests used by the 303 URIs. Its format contains the base part of the URI and a fragment identifier separated from the base by a hash symbol. When a client requests hash URI, the fragment part is stripped off before requesting the URI from the server. This means that the hash URI does not necessarily identify a web document and can be used to identify real-world objects. AFF 4: is an open format for the storage and processing of digital evidence. Its designadopts a scheme of globally unique identifiers (URN) for identifying and referring to all evidence. The great advantage of this format is representing different forensic metadata in the form of RDF triple (subject, predicate, and value), where the subject is the URN of the object the statement is made about and the predicate (e.g., datelogin, datelogout, evidenceid, affiliation,.. etc) can be any arbitrary attribute, which can be used to store any object in the AFF4 universe.

12 CF-CoC Web Application form
KOM, SEKE, June 20, 2004 CF-CoC Web Application form The CF-CoC web application form should be designed to : Import resources from the forensic parts Import resources from the victim parts Create and describe resources by the support of Existing terms imported from well established vocabularies New terms imported from custom vocabulary created to describe the CoC for each forensic phase Add provenance metadata to the forensics data

13 Pattern Consumption Applications
KOM, SEKE, June 20, 2004 Pattern Consumption Applications Three main patterns can be used by juries to consume this information of the CoC : Browsing Searching Querying Browsing : is like traditional web browsers that allow users to navigate between HTML pages. Same idea is applied for linked data, but the browsing is performed through the navigation over different resources, by following RDF links and downloads them from a separate URL (e.g., RDF browsers such as Disco, Tabulator, or OpenLink) Searching : RDF crawlers are also developed to crawl linked data from the web by following RDF links. Crawling linked data is a search using a keyword related to the item in which juries are interested Querying : Juries can also perform extra search filtering using query agents. This type of searching is performed when SPARQL endpoints are installed, which allow expressive queries to be asked against the dataset

14 KOM, SEKE, June 20, 2004 Provenance Metadata The ability to track the origin of data is a key component in building trustworthy, which is required for the admissibility of digital evidences Provenance information can be integrated within the forensic process using 3 different methods : Provenance vocabularies Open provenance model Named Graph: used to denote a collection of triples with relevant provenance information. The set of RDF triples is the considered as one graph (NG) and it is assigned a URI reference. Provenance vocabulary like for example the Dublin Core. OPM : describes provenance in terms of agents, artifacts, and processes (e.g. OPMV) NG : used to denote a collection of triples with relevant provenance information. The idea of a named graph is to take a set of RDF triples, and consider them as one graph, and then assign to it a URI reference. Thus, RDF can be used to describe this graph using RDF triples, which describe the creator or the retrieval data of the graph

15 Ex. Abstract NG of Kruse Model
KOM, SEKE, June 20, 2004 Ex. Abstract NG of Kruse Model NGAnaly NGAuth dc:publisher dc:creator Jean Pierre Ann Marie Named Graph : The idea of the named graph is to take a set of RDF triples, and considering them as one graph and assign to it a URI reference. NGAcqui dc:date 20 Mar 2011

16 Ex. : Usage of custom term & Metadata
KOM, SEKE, June 20, 2004 Back Forward Ex. : Usage of custom term & Metadata Genid:A14471 Evidence01 :34:15Z AFF4 MD5 Machine1 db64e67f5b41bbc0f3728c2eae4f07eb This figure shows that shows an example of how the custom terms (e.g.,investigator and Investigated) defined using lightweight ontology are used. Victim resources (e.g., who: Jean-Pierre defined by Digital Test), and forensics resources (e.g., What: evidence, Why: hash, Where: location defined in the AFF4), and terms from the DC vocabulary (e.g., When: date) are all integrated together in a unified framework answering the six questions of the authentication phase. NGAuth

17 Public Key Infrastructure
KOM, SEKE, June 20, 2004 Public Key Infrastructure Applying PKI to LOD, transform it to LCD Allows juries to ensure from the identity of role players participated in the forensic investigation The main idea behind applying the PKI to LOD is based on the PK cryptography, where senders (role players and CA) make signature using their private key, and the jury verifies these signatures using their public key. the PKI certifications are applied in this context: 1. Juries send a list of players who are supposed to work on the current cyber crime case. Sending this list to the CA, controls the data access to only these players. This prevents the disclosure (keeps the confidentiality) of data to unauthorized people. 2. The role player generates a public-private key pair ({KU-P, KR-P}), where P is all information identifying the player, R is private, and U is public. The player stores the private key in a secure storage to keep its integrity and confidentiality, and then sends the public key KU-P to the CA. 3. The player’s public key and its identifying information P are signed by the authority using its ({KR-CA}) private key. The resulting data structure is back to the role player. R-CA {P, KU-P} is called the public key certificate of the role player, and the authority is called a public key certification authority (i.e., symbols outside brackets mean the signature of the data structure). 4. Juries obtain the authority’s public key {KU-CA}. 5. Each player creating a CoC must authenticate himself to juries by signing his RDF graph G using his private key R-P{G} (i.e., all triples describing a phase are assembled in one graph called G). Later, before the court session, each player sends the certification R-CA {P, KU-P} to juries accompanied with the signed graph R-P{G}. The main idea behind this scenario is based on the PK

18 PKI Scenario 1. 4. KU-CA 2. K U-P 3. R-CA{ P,K U-P }
KOM, SEKE, June 20, 2004 PKI Scenario K R-P 2. K U-P 3. R-CA{ P,K U-P } 4. KU-CA 1. 5. R-CA{ P, KU-P } Sign NG

19 Conclusion and Perspective
KOM, SEKE, June 20, 2004 Conclusion and Perspective 1. New combination of several fields in the same framework, such as cyber forensics, semantic web, provenance vocabularies, PKI Approach, and LDP. 2. Underline that each phase in the forensics process should have its own CoC along any forensics model. 3. Provide a framework that leads to the creation of an assistance system for juries in a court of law. 4. Integrate provenance metadata to the victim/forensics data, in order to answer questions about the origin of information published by the role players during the forensics investigation. 5. Using the PKI approach to ensure the identities of each player participating in the forensics process. Transforming tangible CoC to electronic one consumable by people and machines.

20 KOM, SEKE, June 20, 2004 Future Work Current framework will be extended by extra educational resources for aid purposes These educational resources provide help to the role players and juries to respectively publish and consume the represented data

Download ppt "KOM, SEKE, June 20, 2004 Representing Chains of Custody Along a Forensic Process: A Case Study on Kruse Model Tamer Fares Gayed, UQAM Hakim Lounis, UQAM."

Similar presentations

Ads by Google