Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland.

Similar presentations


Presentation on theme: "Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland."— Presentation transcript:

1 Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

2 Summary Web/Internet data collection is becoming increasingly important for many social science fields Being able to formalize and enforce policies for regulating the collection and the use of those data is crucial, especially taking into account privacy and confidentiality wishes of who provided the data Even if such policies are not all enforced by data publishers their fulfilment is crucial to follow an ethics in Internet Research We present the SemPolicy Manager Tool, which is able to enforce a given set of policies by taking into account the meaning of the collected data

3 Web/Internet data collection technologies Internet data collection by means of –web service interfaces: a software designed to support Machine- to-Machine interaction over a network, or –system specific APIs (Application Program Interface) a specific interface for accessing the data of a data provider Web data collection by means of web crawlers: a software which is able to ssystematically browse the World Wide Web, building a local repository of the portion of the Web that it visits, very often the purpose is Web indexing Examples used in the paper: –Facebook RestFB a Facebook Graph API written in JavaRestFB –Twitter REST API an interface for programmatic access to read and write Twitter dataREST API

4 Type of Policies Ethical guidelines proposed by various associations for social research (e.g. American Association for Public Opinion Research at point I.A.5American Association for Public Opinion Research Legal constrains on the processing of personal data (e.g. the European Union Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data.European Union Directive 95/46/EC –This directive states the necessity of anonymization at point (26), define the notion of personal data and processing of personal data in Article 2, and constraint personal data processing in Article 8 Web site policies/terms on how the data available on a web site can be used for automatic data collection (e.g. Facebook Automated Data Collection Terms and robot.txt files)Facebook Automated Data Collection Termsrobot.txt

5 The SemPolicy Manager Tool Innovative technologies used for realizing the tool: 1.Semantic Web technologies for expressing the meaning of the data 2.Declarative norms formalization and enforcement for expressing policies 3.Natual Language Processing Techniques used to enrich the collected data with new semantic information contained in unstructured text

6 Architecture of the SemPolicy Manager Tool

7 Using the SemPolicy Manager Tool (1) We evaluated the tool on a specific use case: the collection of social network data from Facebook and Twitter, and the enforcement on those data of certain articles of the EU Directive 95/46/EC, stating the necessity of anonymization of personal data and of data revealing confidential information on people (point 26, Article 2 and 8). The enforced policies are: Policy 1. It is obligatory to make anonymous all personal data relating to an identified or identifiable natural person in order to store, retrieve, and use them. Those properties include: username, user ID, first name, last name, full name, web site. Policy 2. It is obligatory to anonymize or remove a text if it reveals racial or ethnic origin, political opinions, religious or philosophical beliefs.

8 Policy 1 and 2 -> 3 Obligations From Policy 1 and 2 we formalized the following three obligations having an activation condition and an action to be performed: Policy 1-Obligation 1: it is activated when in the SN Ontology there is a user personal data which is not popular. The obliged action consists in retrieving all user's personal information and then anonymize them. Policy 1-Obligation 2: it is activated when in the SN Ontology there is a message (the content of a post or of a comment or of a twit) and it contains personal information. The obliged action consists in anonymizing all personal information that appear in the content of posts/comments/twits. Policy 2-Obligation 1: it is activated when in the semantically enriched collected data there is a statement (post or comment or twit) whose content is related to a sensitive topic. The obliged action consists in removing sensitive topics in the content of posts or comments, or twitts.

9 Using the SemPolicy Manager Tool (2) The Semantic Analysis Component needs to identify in the collected data (post, comments, and twits) 1. personal data: first name, last name, full name (of people), web sites (popular names do not need to be anonymized) 2. sensitive data: data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs. The Enforcement Service is in charge of checking if the policies, stored in the Policy Ontology, are active (this depends on the semantic content of the collected data) and it is in charge of enforcing the active policies.

10 Evaluation of the SemPolicy Manager Tool The response time, for the enforcement of the three obligations* reaches a stable level at some point, this means that our application can be applied in reality. 1.The first obligation takes more time (with Facebook data it takes 50 minutes with 200 seed users, with Twitter data it takes 12 minutes with 500 users) than the other ones because there are many private attributes of facebook/twitter users, even more than the number of private data entries found within the messages. 2.The second obligation requires 5 minutes for 200 Facebook seed users and 12 minutes for 500 Twitter users. 3.The third obligation requires 0.20 minutes for 200 Facebook seed users and 0.28 minutes for 400 Twitter users. * using a PC with Intel(R) Core(TM) 2 Quad CPU Q9650 @ 3.00Ghz and 4GB RAM

11 Conclusions Thanks to the use of Semantic Web Technologies for representing the collected data and the policies, it is possible to change the activation condition of the formalized policies without the need to reprogram the tool The tool can be used to enforce other policies but it may be necessary to program the software for the execution of the obliged action and/or extending the Semantic Analysis Component In our future work we plan to study how to improve the user interface of the SemPolicy Manager Tool

12 Thank you for your attention! Questions?


Download ppt "Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland."

Similar presentations


Ads by Google