Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1.

Similar presentations

Presentation on theme: "Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1."— Presentation transcript:

1 Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1

2 Knowledge out there on the Web: Video of the talk at the Royal Society Video of the talk at the Royal Society 2014Abiteboul - EDBT keynote, Athenes2 Knowledge out there on the Web Personal knowledge out there

3 Organization 1.The context 2.The personal information management system 1.The concept of Pims 2.Pims are coming 3.Advantages 3.From information to knowledge 4.The Webdamlog language 1.The language in brief 2. Probabilities 3.Access control 5.Conclusion: some research issues 2014Abiteboul - EDBT keynote, Athenes3

4 1.The context 2014Abiteboul - EDBT keynote, Athenes4

5 Data explosion data: pictures, music, movies, reports, email, tweets, contacts, schedules… social interactions: opinions, annotations, recommendation… metadata: on photos, documents, music… ontologies: Alices ontology and mapping with other ontologies web localizations: friends account on FB, twitter, lists of blogs… security: credentials on various systems data in various organizations – jobs, schools, insurances, banks, taxes, medical, retirement… data in various vendors – amazon, retailers, netflix, applestore… data that software or hardware sensors capture – with or without our knowledge – web navigation, phone use, geolocation, "quantified self" measurements, contactless card readings, surveillance camera pictures, … 2014Abiteboul - EDBT keynote, Athenes5

6 Data dispersion Laptop, desktop, smartphone, tablet, car computer Residential boxes (tvbox), NAS, electronic vaults… Mail, address book, agenda, todo-lists Facebook, LinkedIn, Picasa, YouTube, Tweeter Svn, Google docs, Dropbox Government services Business services Also machine and systems from – family, friends, associations, work Systems even unknown to the user – third party cookies 2014Abiteboul - EDBT keynote, Athenes6

7 Data heterogeneity Type: text, relational, HTML, XML, pdf… Terminology/structure/ontology Systems: MS, Linux, IOS, Android Distribution Security protocols Quality: incomplete / inconsistent information 2014Abiteboul - EDBT keynote, Athenes7

8 Bad news Limited functionalities because of the silos – Difficult to do global search, synchronization, task sequencing over distinct systems… Loss of control over the data – Difficult to control privacy – Leaks of private information Loss of freedom – Vendor lock-in 2014Abiteboul - EDBT keynote, Athenes8

9 Growing resentment Against companies – Intrusive marketing, cryptic personalization and business decisions (e.g., on pricing), and automated customer service with no real channel for customers' voices – Creepy "big data" inferences Against governments – NSA and its European counterparts Dissymmetry between what these systems know about a person, and what the person actually knows 2014Abiteboul - EDBT keynote, Athenes9

10 Future alternatives (for normal people) 1.Continue with this increasing mess – Use a shrink to overcome frustration 2.Regroup all your data on the same platform – Google, Apple, Facebook, …, a new comer – Use a shrink to overcome resentment 3.Study 2 years to become a geek – Geeks know how to manage their information – Use a shrink to survive the experience 4.And, of course, there is the Pims way 2014Abiteboul - EDBT keynote, Athenes10

11 2.The personal information management system 2.1Introduction 2014Abiteboul - EDBT keynote, Athenes11

12 The Pims Personal information management system What is a successful Web service today – Some great software – Some machines on which it runs (and a business model) Separate the two facets – Some company provides the software – It runs on your machine with another business model 2014Abiteboul - EDBT keynote, Athenes12

13 The Pims (1) The Pims runs software – The user chooses the code to deploy on the server. – The software is open source, a requirement for security. With the user's data – All the users personal information 0n the users server(s) – The user owns it or pays for a hosted server – The server may be a physical or a virtual machine – It may be physically located at the users home (e.g., a tvbox) or not – It may run on a single machine or be distributed among several machines – The server is in the cloud, i.e., it can be reached from everywhere - personal cloud 2014Abiteboul - EDBT keynote, Athenes13

14 The Pims: the 2 main issues Security – Enforced by the Pims: guaranteed by the contract the user has with the Pims Reasonably small piece of code; possible to verify it – Enforced by the services running on it: open source so that we dont need to trust the providers of these systems – A higher level of security than now The management – Should be epsilon-work – Should require little competence – A company can be paid to do it (in the cloud) 2014Abiteboul - EDBT keynote, Athenes14

15 2.The personal information management system 2.2This is arriving 2014Abiteboul - EDBT keynote, Athenes15

16 It is becoming possible System administration is easier – Abstraction technologies for servers – Virtualization and configuration management tools. Open source is very active – Open source technology more and more available Price of machines is going down – A hosted-low cost server is as cheap as 5/month – Paying is no longer a barrier for a majority of people Indeed I am sure you have friends already doing it 2014Abiteboul - EDBT keynote, Athenes16

17 Many people are working on it Many systems & projects – Lifestreams, Stuff-Ive-Seen, Haystack, MyLifeBits, Connections, Seetrieve, Personal Dataspaces, or deskWeb. – YounoHost, Amahi, ArkOS, OwnCloud or Cozy Cloud Some on particular aspects – Mailpile for mail – Lima for a Dropbox-like service, but at home. – Personal NAS (network-connected storage) e.g. Synologie – Personal data store SAMI of Samsung... Many more 2014Abiteboul - EDBT keynote, Athenes17

18 Data disclosure movement Smart Disclosure in the US MiData in the UK MesInfos in France Several large companies (network operators, banks, retailers, insurers…) have agreed to share with a panel of customers the personal data that they have about them 2014Abiteboul - EDBT keynote, Athenes18

19 Big companies are interested (1) Pre-digital companies E.g., hotels or banks Disintermediated from their customers by pure Internet players such as Google, Amazon,, Mint. In Pims, they can rebuild direct interaction The playing field is neutral – Unlike on the Internet where they have less data They can offer new services without compromising privacy 2014Abiteboul - EDBT keynote, Athenes19

20 Big companies are interested (2) Home appliances companies Many boxes deployed at home or in datacenters – Internet access provider "boxes, NAS servers, "smart" meters provided by energy vendors, home automation systems, "digital lockers… Personal data spaces dedicated to specific usage Could evolve to become more generic Control of private Internet of objects 2014Abiteboul - EDBT keynote, Athenes20

21 2.The personal information management system 2.3Advantages 2014Abiteboul - EDBT keynote, Athenes21

22 Advantages User control over their data – Who has access to what, under what rules, to do what User empowerment – They choose freely services & they can leave a service Participation to a more neutral Web – With the "network effects", the main platforms are accumulating data/customers and distorting competition – The Pims bring back fairness on the Web – Good practices are encouraged, e.g., interoperability, portability 2014Abiteboul - EDBT keynote, Athenes22

23 Advantages – New functionalities Single identity/login Semantic global search with (personal) ontology Synchronization/backups across services Access control management across services Task sequencing across services Exchange of information between friends Connected objects control, a hub for the IoT Personal big data analysis 2014Abiteboul - EDBT keynote, Athenes23

24 3.From information to knowledge 2014Abiteboul - EDBT keynote, Athenes24 (aka lets move a tad more technical)

25 Machines prefer knowledge Integration of data & information sources – It is easier to integrate knowledge than information Collaboration between services & devices – It is easier for services to collaborate using knowledge than with information Problem solving based on knowledge inference 2014Abiteboul - EDBT keynote, Athenes25

26 Humans as well The users of the system are human beings – They want support for managing information – But they are not geeks – They dont want to program To facili tate the interactions between humans and machines, We should use declarative languages ! 2014Abiteboul - EDBT keynote, Athenes26

27 It all started with datalog Popular in the 90s Some followers in 00s – A., Afrati, Atzeni, Cali, Greco, Gotloeb, Milo, Sacca, Ullman… Recent revival – 2010 Oege de Moors workshop @oxford Datalog 2.0 – 2010 Joe Hellersteins keynote @pods Datalog Redux: Experience and Conjecture – 2014 Frank Nevens keynote @icdt Remaining CALM in declarative networking Now featuring: Webdamlog 2014Abiteboul - EDBT keynote, Athenes27

28 Digression: knowledge acquisition Extraction of knowledge from information – In the style of Yagos extraction – Alignment between ontologies (Paris system) Production of knowledge by services Mining of knowledge by data analysis/mining Inference of knowledge (inference engines) The machines produce the knowledge 2014Abiteboul - EDBT keynote, Athenes28

29 Requirement 1: Distribution Different machines Different users We use the notion of principal here – family@alice(Bob) – agenda@Alice-iPhone(…) – friends@Alice-FaceBook(…) A principal comes with identity and privileges 2014Abiteboul - EDBT keynote, Athenes29

30 Requirement 2: Privacy Control of who sees what in a distributed environment Access control Should be clear from the first part of the talk this is a most important issue Tutorial on privacy by Nicolas Anciaux, Benjamin Nguyen, Iulian Sandu Popa – Today at 2:00 2014Abiteboul - EDBT keynote, Athenes30

31 Requirement 3: Probabilities We have to deal with negation – Elvis was not French With negations, come contradictions – Elvis Presley died in 1977; The King is alive There are different points of view – Elviss music is the best; it stinks Measure uncertainty with probabilities 2014Abiteboul - EDBT keynote, Athenes31 The more I see, the less I know for sure.John Lennon

32 So, what is the goal A datalog-style language with distribution access control probabilities We are lucky, there is such a language: Webdamlog 2014Abiteboul - EDBT keynote, Athenes32

33 4.The Webdamlog language (aka lets be serious) 4.1Webdamlog in brief 2014Abiteboul - EDBT keynote, Athenes33

34 Facts and rules Facts are of the form R@p(a1,…,an) – p is a principal, i.e., Serge, Serges-iPhone, Facebook/Serge, Rules are of the form $R@$P($U) :- $R 1 @$P 1 ($U 1 ),..., $R n @$P n ($U n ) – $R, $R i are relation terms – $P, $P i are peer terms – $U, $U i are tuples of terms – Safety condition – Also negations: ignored here 342014Abiteboul - EDBT keynote, Athenes

35 The semantics of rules Classification based on locality and nature of head predicates (intentional or extensional) Local rule at my-laptop: all predicates in the body of the rules are from my-laptop Local with local intentional headdatalog Local with local extensional headdatabase update Local with non-local extensional headmessaging between peers Local with non-local intentional head view definition Non-localgeneral delegation 352014Abiteboul - EDBT keynote, Athenes

36 Local rules with local head Intensional local head – datalog [at my-iphone] fof@my-iphone($x, $y) :- friend@my-iphone($x,$y) fof@my-iphone($x,$y) :- friend@my-iphone($x,$z), fof@my-iphone($z,$y) Extensional local head– database updates [at my-iphone] believe@my-iphone(Alice, $loc) :- tell@my-iphone($p,Alice, $loc), friend@my-iphone($p) 362014Abiteboul - EDBT keynote, Athenes

37 Local rules & non-local extensional head Messaging between peers $message@$peer($name, Happy birthday!) :- today@my-iphone($date), birthday@my-iphone($name, $message, $peer, $date) Example – today@my-iphone(3/25) – birthday@my-iphone("Manon, sendmail,, 3/25) –"Manon, Happy birthday) 2014Abiteboul - EDBT keynote, Athenes37

38 Local rules & non-local intentional head View definition boyMeetsGirl@gossip-site($girl, $boy) :- girls@my-iphone($girl, $event), boys@my-iphone($boy, $event) Semantics of boyMeetGirl@gossip-site is a join of relations girls and boys from my-iphone Defines a view at some other peer 382014Abiteboul - EDBT keynote, Athenes

39 Non-local rules General delegation (at my-iphone): boyMeetsGirl@gossip-site($girl, $boy) :- girls@my-iphone($girl, $event), boys@alice-iphone($boy, $event) Example: girls@my-iphone(Alice, Julia's birthday) – my-iphone installs the following rule at alice-iphone boyMeetsGirl@gossip-site(Alice, $boy) :- boys@alice-iphone($boy, Julia's birthday) Useful to distribute work and exchange knowledge 392014Abiteboul - EDBT keynote, Athenes

40 The thesis The Web should turn into a distributed knowledge base where peers share facts and rules, and collaborate The language Webdamlog is a first step towards that goal Missing – Probabilities – Access control 2014Abiteboul - EDBT keynote, Athenes40

41 4.The Webdamlog language 4.2Probabilities 2014Abiteboul - EDBT keynote, Athenes41

42 Advertisement Deduction with Contradictions in Datalog S.A., Daniel Deutch and Victor Vianu – Tomorrow, 11:00 2014Abiteboul - EDBT keynote, Athenes42

43 4.The Webdamlog language 4.3Access control 2014Abiteboul - EDBT keynote, Athenes43

44 Requirements Data access Users would like to control who can read and modify their information Data dissemination Users would like to control how their data are transferred from one participant to another Application control Users would like to control which applications can run on their behalf, and what information these applications can access. 2014Abiteboul - EDBT keynote, Athenes44

45 The general picture Coarse grain for extensional relations – read access to the relation Fine grain for intensional relations – read access to tuple t requires read access to the tuples that lead to deriving t Delegation controlled in a sandbox Focus on read privilege here 2014Abiteboul - EDBT keynote, Athenes45

46 Read: default Extensional relations – if you have read privilege to the relation Intensional relations – if you have read privilege to the relation & – if you can read all the tuples that have been used to create this fact – provenance of the fact 462014Abiteboul - EDBT keynote, Athenes

47 Coarse grain access control [at Alice] album@Bob($p,$f) :- photo@Alice($p,$f), friend@Alice($f) – album@Bob is extensional – Whoever has read access to album@Bob sees all the relation – The default for extensional relations is very permissive 472014Abiteboul - EDBT keynote, Athenes

48 Fine grain access control [at Bob] album@Alice($p,$f) :- photo@Bob($p,$f) [at Sue] album@Alice($p,$f) :- photo@Sue($p,$f) – album@Alice is intensional – Both Bob and Sue contribute to it – Peter who has read privilege to album@Alice and photo@Bob only does not see the photos of Sue 482014Abiteboul - EDBT keynote, Athenes

49 Paranoiac access control [at Bob] album@Alice($p,$f) :- photo@Bob($p,$f), friends@Bob($f) – Issue: you can read Bobs photos only if you have read privilege on friends@Bob that Bob wants to keep private 492014Abiteboul - EDBT keynote, Athenes

50 Declassification [at Bob] photo@Alice($p,$f) :- photo@Bob($p,$f), [ hide friends@Bob($f) ] – Hide: blocks the provenance from friends@Bob – Bob declassify this data just for the evaluation of this rule – You can declassify only tuples you own grant privilege 502014Abiteboul - EDBT keynote, Athenes

51 Issues with non local rules [at Bob] message@Sue(I hate you) :- date@Alice(d) aliceSecret@Bob(x) :- date@Alice(d), secret@Alice(x) Ignoring access rights, by delegation, this results in running [at Alice] message@Sue(I hate you) :- date@Alice(d) aliceSecret@Bob(x) :- date@Alice(d), secret@Alice(x) 512014Abiteboul - EDBT keynote, Athenes

52 Default solution: sand box We run the rule at Alice in a Sandbox We use the access rights of Bob So the second rule does not succeed in sending secrets The message specifies that this is done at Bobs request So requires authentication/signatures Alternative: delegation without sandbox 522014Abiteboul - EDBT keynote, Athenes

53 5.Conclusion: some research issues 2014Abiteboul - EDBT keynote, Athenes53

54 2014Abiteboul - EDBT keynote, Athenes54 In a classical database: everything that is not in the db is assumed to be false – closed world On the Web, if you dont know something, it may be out there… Or not– open world Difficulties – We cannot bring all the Web locally – We cannot visit all the Web – How do I know where to find something? – How do I know it is not out there?

55 Explaining Users want to understand the information they see, the answers they are given – In their professional/social life Difficulties – Reasoning with large number of facts – Information is often probabilistic and not public – Requires knowing how the information was obtained (its provenance) 2014Abiteboul - EDBT keynote, Athenes55

56 Privacy Users want to control their data – They have personal data The systems want to get their personal data How do we manage that? – Better systems – Better laws – Better users (digital literacy) 2014Abiteboul - EDBT keynote, Athenes56

57 Serendipity You may hear by chance a song that is going to totally obsess you A librarian may suggest your reading an article that will transform your research This is serendipity A perfect search engine A perfect recommendation system A perfect computer assistant Such systems are boring They lack serendipity 2014Abiteboul - EDBT keynote, Athenes57 Design programs that would introduce serendipity in our lives

58 Hypermnesia Exceptionally exact or vivid memory, especially as associated with certain mental illnesses For a user: We cannot live knowing that any word, any move will leave a trace? For the ecosystem: We cannot store all the data we produce – lack of storage resources 2014Abiteboul - EDBT keynote, Athenes58 Forgetting is Key to a Healthy Mind Scientific American Image: Aaron Goodman A main issue is to select the information we choose to keep

59 Babel of human-machine-interaction Each time a user interacts with a data source, does he have to use the ontology of that source ? No! Instead of a user adapting to the ontologies of the N systems he uses each day We want the N systems to adapt to the users ontology 2014Abiteboul - EDBT keynote, Athenes59

60 Religion…science…machines Knowledge used to be determined by religion Knowledge used to be determined scientifically Knowledge will now be determined by machines? Decisions are increasingly made by machines – Stock market (automatic trading) – Fully automated factory – Fully automated metros – Death penalty (killer drones)… 2014Abiteboul - EDBT keynote, Athenes60

61 to the digital world! We will soon be living in a world surrounded by machines that – acquire knowledge and decide for us What will we do with that technology? Will we become smarter? Will we become master or slave of the new technology? 2014Abiteboul - EDBT keynote, Athenes61

62 2014Abiteboul - EDBT keynote, Athenes62 σας ευχαριστώ

Download ppt "Knowledge out there on the Web Serge Abiteboul 2014Abiteboul - EDBT keynote, Athenes1."

Similar presentations

Ads by Google