Presentation is loading. Please wait.

Presentation is loading. Please wait.

Low-bandwidth Semantic Web

Similar presentations


Presentation on theme: "Low-bandwidth Semantic Web"— Presentation transcript:

1 Low-bandwidth Semantic Web
Onno Valkering Supervisor: Victor de Boer Second reader: Stefan Schlobach 25 mei 2019

2 Low-bandwidth Semantic Web
Contents Why Semantic Web and Low-bandwidth? SPARQL over SMS Challenges Experiments Practical validation Conclusion 25 mei 2019 Low-bandwidth Semantic Web

3 Low-bandwidth Semantic Web
Why? Semantic Web Knowledge sharing The WWW is quite good Open Linked Data - Introduce DigiVet, farmers in rural areas of development countries. - There are more examples, common: valuable knowledge generated by users. - We want to share that knowledge between users through the application. - One way to achieve this is to go custom = exclusive club. - We want to use something that other people also use and can use. - Apparently, a lot of people use the web for this. We want to use that! - More specifically, SW provides to right tools for our knowledge sharing needs. DigiVet 3.0 on a Kasadaka, by Gossa Lô and Romy Blankendaal. 25 mei 2019 Low-bandwidth Semantic Web

4 (Some) Semantic Web in a Nutshell
S: P: O: Represent data by using triples, that combined make up graphs. And by using URIs we can easily reuse common definitions and entities defined by other data sets. When you want to know more about a certain entity or property, just go the URL used, to find out more. In this case we define that Bob is interested in The Mona Lisa, and instead of defining what The Mona Lisa is, we use an already existing definition. 25 mei 2019 Low-bandwidth Semantic Web

5 (Some) Semantic Web in a Nutshell
Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. When a lot of people start using this, you end up with a huge graph that contains a lot of knowledge. It would be nice to have such a graph based on applications used in rural areas of development. 25 mei 2019 Low-bandwidth Semantic Web

6 Low-bandwidth Semantic Web
Why? Low-bandwidth Rural areas of development countries Dependent on GSM network Cost-effective messaging But! Surprise, there is no internet! No Web infrastructure. We are dependent on cellular networks for our M2M communication: SMS. And since they aren’t free, we need to be thrifty with sending SMSes. To me, this qualifies as a low-bandwidth network. 25 mei 2019 Low-bandwidth Semantic Web

7 Low-bandwidth Semantic Web
SPARQL over SMS Enable (Semantic) Web data exchange over GSM networks. Practical differences HTTP and SMS: SMS works with phone number, HTTP works with URLs. SMS has a size restriction, HTTP practically has none. SMS is one-way messaging, HTTP follows request-response. Basic M2M communication based on SPARQL. In short, we want to use Web-like data exchange over GSM networks. To give an idea of the gap between the de facto Web protocol HTTP and SMS. By the saying, “think big, start small”, an initial implementation based on SPARQL subset. SPARQL is a language to query the kind of graphs we saw earlier. With the subset, basic read/write M2M communication can be achieved. 25 mei 2019 Low-bandwidth Semantic Web

8 Low-bandwidth Semantic Web
SPARQL over SMS Won’t go into much technical details, but I want to give an idea about the concept. 25 mei 2019 Low-bandwidth Semantic Web

9 Low-bandwidth Semantic Web
SPARQL over SMS We’re tricking the application, by mimicking the behavior of the database on the other end. Technically, the application and database are talking to each other as they would over the web. 25 mei 2019 Low-bandwidth Semantic Web

10 Low-bandwidth Semantic Web
Challenges Blending synchronous and asynchronous messaging SPARQL compression RDF compression Unpredictable query result sizes This is easier said than done, there are few challenges. Performed experiments to establish our compression methods. 25 mei 2019 Low-bandwidth Semantic Web

11 RDF Compression: Experiment setup
Set of real-world RDF data Quarter of a million files Four different strategies: Serialization format Text compression Dictionary encoding Reasoning Acquired almost quarter of a million real-world files ranging from 1 to 1000 triples. We’ve subjected them to four compression strategies. Measured the compression rate of each strategy to come up with the best method. 25 mei 2019 Low-bandwidth Semantic Web

12 RDF Compression: Experiment setup
Serialization formats Set of real-world RDF data Quarter of a million files Four different strategies: Serialization format Text compression Dictionary-encoding Reasoning Five in total N-Triples Turtle RDF/XML EXI HDT The format the triples are stored. One is more verbose than the other. Serialized each file into these formats, and recorded the compression rate. 25 mei 2019 Low-bandwidth Semantic Web

13 RDF Compression: Experiment setup
Text compression Set of real-world RDF data Quarter of a million files Four different strategies: Serialization formats Text compression Dictionary-encoding Reasoning Only gzip compression Applied after serialization In addition, we applied gzip compression to each serialization format. Again, recorded the compression rate. 25 mei 2019 Low-bandwidth Semantic Web

14 RDF Compression: Experiment setup
Dictionary-encoding Set of real-world RDF data Quarter of a million files Four different strategies: Serialization format Text Compression Dictionary-encoding Reasoning Shared vocabulary files Common definitions Top 10, 20, 30 (popularity) Predefined IDs for URIs This kind of compression is based on vocabulary files. These are sets of reusable common definitions for our types and properties. For each URI in the vocabulary an ID is generated. When we go through the RDF file, known URIs are replaced by the shorter ID. We performed this based on the Top 10, Top 20 and Top 30 popular vocabs. 25 mei 2019 Low-bandwidth Semantic Web

15 RDF Compression: Experiment setup
Reasoning Set of real-world RDF data Quarter of a million files Four different strategies: Serialization format Text Compression Dictionary-encoding Reasoning Shared vocabulary files Common definitions Top 10, 20, 30 (popularity) Semantic redundancies 12 RDFS patterns 2 OWL patterns Also based on the vocabularies. But it searches for semantic redundancies based on 14 patterns. We’ll see later how this works. During the Practical Validation. 25 mei 2019 Low-bandwidth Semantic Web

16 RDF Compression: Results
Number of SMSes Avg. number of triples (black line) Avg. number of triples (orange line) 1 2 3 6 8 4 9 16 5 21 24 66 84 7 98 116 126 175 189 10 301 Short summary of the results. The black line is first two strategies (general) We can see some nice compression rates, increasing as the number of triples increases. More interesting is the orange line is the last two strategies added (SW) By using SW specific features, we can compress data even more! We can see the practical effect of this in this table. For example, when using black: 4=9 and orange 4=16. 25 mei 2019 Low-bandwidth Semantic Web

17 SPARQL Compression: Experiment setup
Set of real-world SPARQL queries 500 in total Two strategies Text compression RDF compression Performed a comparable experiment for SPARQL compression. Mixed results. Most of the time RDF compression scored best, but almost 40% = send plain. 25 mei 2019 Low-bandwidth Semantic Web

18 SPARQL Compression: Results
Not always smaller results. 18% text compression is best. 38% no compression is best. 44% RDF compression scored best. Dynamically determine best strategy. Performed a comparable experiment for SPARQL compression. Mixed results. Most of the time RDF compression scored best, but almost 40% = send plain. 25 mei 2019 Low-bandwidth Semantic Web

19 Low-bandwidth Semantic Web
Practical Validation Original 14 SMSes $ 0.196 Now we understand what is going on, let’s see it in action. Query result that contains 18 triples describing 3 animal diagnoses from DigiVet. We added the case specific vocabulary to the set of vocabularies used in compression. SMSes are calculated on 140 characters, because of the implementation used. 25 mei 2019 Low-bandwidth Semantic Web

20 Low-bandwidth Semantic Web
Practical Validation Serialization 8 SMSes $ 0.112 25 mei 2019 Low-bandwidth Semantic Web

21 Low-bandwidth Semantic Web
Practical Validation Reasoning 7 SMSes $ 0.098 25 mei 2019 Low-bandwidth Semantic Web

22 Low-bandwidth Semantic Web
Practical Validation Dictionary-encoding 6 SMSes $ 0.084 25 mei 2019 Low-bandwidth Semantic Web

23 Low-bandwidth Semantic Web
Practical Validation Text compression 3 SMSes $ 0.042 25 mei 2019 Low-bandwidth Semantic Web

24 Low-bandwidth Semantic Web
Conclusion Semantic Web can be used without a Web-infrastructure. Specific Semantic Web features can be used for compression. SPARQL over SMS is free and open-source. As we have seen, it is possible to use SW without a Web-infrastructure. Example for SMS, the same concept can be applied for other type of networks - This allows for M2M knowledge sharing in a way we are used to: Web. - It became clear that the specific properties of SW can be used for compression. 25 mei 2019 Low-bandwidth Semantic Web


Download ppt "Low-bandwidth Semantic Web"

Similar presentations


Ads by Google