Presentation is loading. Please wait.

Presentation is loading. Please wait.

WP5 – Platform - Objective To design, build, establish, run, and sustain the Platform, and take primary responsibility for dissemination of the project.

Similar presentations


Presentation on theme: "WP5 – Platform - Objective To design, build, establish, run, and sustain the Platform, and take primary responsibility for dissemination of the project."— Presentation transcript:

1 WP5 – Platform - Objective To design, build, establish, run, and sustain the Platform, and take primary responsibility for dissemination of the project outputs. This package consists of a number of elements creating what is envisioned as a ‘virtual knowledge network’, which for practical reasons will build around the project website as a knowledge node.

2 What questions might a user pose to childhealthresearch.eu What is available that is applicaple to Europe on 1)Measles immunisation of transient populations 2)Parental discord and teenage self harm' 3)Nuitritional imbalance in low income families 4)Internet addiction - Problem or Falacy 5)Safe playspace and childrens wellbeing in urban settings 6)Researchers with experience interviewing children about eating habits 7)Comparative study into immunisation uptake in European countries

3

4 Website subjects ● Web site ● Collaborative tools ● Information ● Upload a document ● Auto analysis ● Classifiers – taxonomy ● Funding ● Capacity ● Document repository ● Metadata store ● Metadata harvesting ● Find a document ● Classifiers ● Free text? ● What is returned

5

6 Two search functions Repositories - Site ● Site – full text search – general site, not the subject of this discussion ● Repository ● Simple ● Search for Title, author, publisher, taxonomy term ● Text search for standard word combinations (), +, -, | ● Advanced Search – Search Builder like Pubmed ● Select from each Taxonomy axis ● Combine items with AND, OR, AND NOT

7 ● The Basic search help article covers all the most common issues, but sometimes you need a little bit more power. This document will highlight the more advanced features of Google Web Search. Have in mind though that even very advanced searchers, such as the members of the search group at Google, use these features less than 5% of the time. Basic simple search is often enough. As always, we use square brackets [ ] to denote queries, so [ to be or not to be ] is an example of a query; [ to be ] or [ not to be ] are two examples of queries. ● * Phrase search ("") ● By putting double quotes around a set of words, you are telling Google to consider the exact words in that exact order without any change. Google already uses the order and the fact that the words are together as a very strong signal and will stray from it only for a good reason, so quotes are usually unnecessary. By insisting on phrase search you might be missing good results accidentally. For example, a search for [ "Alexander Bell" ] (with quotes) will miss the pages that refer to Alexander G. Bell. ● * Search within a specific website (site:) ● Google allows you to specify that your search results must come from a given website. For example, the query [ iraq site:nytimes.com ] will return pages about Iraq but only from nytimes.com. The simpler queries [ iraq nytimes.com ] or [ iraq New York Times ] will usually be just as good, though they might return results from other sites that mention the New York Times. You can also specify a whole class of sites, for example [ iraq site:.gov ] will return results only from a.gov domain and [ iraq site:.iq ] will return results only from Iraqi sites. ● * Terms you want to exclude (-) ● Attaching a minus sign immediately before a word indicates that you do not want pages that contain this word to appear in your results. The minus sign should appear immediately before the word and should be preceded with a space. For example, in the query [ anti-virus software ], the minus sign is used as a hyphen and will not be interpreted as an exclusion symbol; whereas the query [ anti-virus -software ] will search for the words 'anti-virus' but exclude references to software. You can exclude as many words as you want by using the - sign in front of all of them, for example [ jaguar -cars -football -os ]. The - sign can be used to exclude more than just words. For example, place a hyphen before the 'site:' operator (without a space) to exclude a specific site from your search results. ● * Fill in the blanks (*) ● The *, or wildcard, is a little-known feature that can be very powerful. If you include * within a query, it tells Google to try to treat the star as a placeholder for any unknown term(s) and then find the best matches. For example, the search [ Google * ] will give you results about many of Google's products (go to next page and next page -- we have many products). The query [ Obama voted * on the * bill ] will give you stories about different votes on different bills. Note that the * operator works only on whole words, not parts of words. ● * Search exactly as is (+) ● Google employs synonyms automatically, so that it finds pages that mention, for example, childcare for the query [ child care ] (with a space), or California history for the query [ ca history ]. But sometimes Google helps out a little too much and gives you a synonym when you don't really want it. By attaching a + immediately before a word (remember, don't add a space after the +), you are telling Google to match that word precisely as you typed it. Putting double quotes around a single word will do the same thing. ● * The OR operator ● Google's default behavior is to consider all the words in a search. If you want to specifically allow either one of several words, you can use the OR operator (note that you have to type 'OR' in ALL CAPS). For example, [ San Francisco Giants 2004 OR 2005 ] will give you results about either one of these years, whereas [ San Francisco Giants 2004 2005 ] (without the OR) will show pages that include both years on the same page. The symbol | can be substituted for OR. (The AND operator, by the way, is the default, so it is not needed.) ● Exceptions ● Search is rarely absolute. Search engines use a variety of techniques to imitate how people think and to approximate their behavior. As a result, most rules have exceptions. For example, the query [ for better or for worse ] will not be interpreted by Google as an OR query, but as a phrase that matches a (very popular) comic strip. Google will show calculator results for the query [ 34 * 87 ] rather than use the 'Fill in the blanks' operator. Both cases follow the obvious intent of the query. Here is a list of exceptions to some of the rules and guidelines that were mentioned in this and the Basic Search Help article:

8 Simple Search

9 Search results Filtered out keyword (NOT immigrants)

10 Search results Grouped by language

11 Search results Grouped by year of publication

12 'Build a Search' Example from another application Drag axis onto workspace – select items – add boolean operator – drag next axis etc

13

14 Add a publication ● Upload file ● Auto analyse ● Provides information to assist the classifier ● Can drag from 'auto analysis' to Classification form ● Manual 'tag' based on Taxonomy selectors ● At least one selection per major axis

15 Three sources of 'papers' or research 1 Adding a paper from your local PC 2 Linking to a paper stored on another system – may sometimes require User Id and Password to access 3 Referencing the metadata of a paper – from our own metadata store – allows us to extend the classification from Riche

16 Auto Analysis Classification workspace Document view after upload

17 Auto Analysis Classification workspace Document view – accept some suggested metadata

18 Publication metadata view

19

20 Language ?? ● European multilingual thesaurus on health promotion in 12 languages.

21 Auto analysis and suggested classifiers ➢ Term extraction can be performed to provide quick insight on what a document is about. ➢ On a large site with a lot of content and tags (or subjects in the plone lingo) it might be difficult to assign tags to new content. In this case, a trained classifier could provide useful suggestions to an editor responsible for tagging content. ➢ Clustering can help you organize unclassified content into groups.

22 POS taggers, utilities for classifying words in a document as Parts Of Speech. Two are provided at the moment, a Penn TreeBank tagger and a trigram tagger. Both can be trained with some other language than english which is what we do here.Parts Of Speech Term extractors, utilities responsible for extracting the important terms from some document. The extractor we use here, assumes that in a document only nouns matter and uses a POS tagger to find those mostly used in a document. For details please look at the code and the tests. Content classifiers, utilities that can tag content in predefined categories. Here, a naive Bayes classifier is used. Basically, the classifier looks at already tagged content, performs term extraction and trains itself using the terms and tags as an input. Then, for new content, the classifier will provide suggestions for tags according to the extracted terms of the content.naive Bayes Clusterers, utilities that without prior knowledge of content classification can group content into groups according to feature similarity. At the moment NLTK's k-means clusterer is used.k-means How it works?

23 Document repository ● Sometimes we must store the document

24 What is the MOAI Server? MOAI is an open access server platform for institutional repositories. The server aggregates content from disparate sources, transforms it, stores it in a database, and (re)publishes the content, in one or many OAI feeds. Each feed has its own configuration. The server has a flexible system for combining records into sets and uses these sets in the feed configuration. MOAI also comes with a simple yet flexible authentication scheme that can easily be customized. Besides providing authentication for the feeds, the authentication also controls access to the assets. MOAI is a standalone system that can be used in combination with any repository software that comes with an OAI feed such as Fedora Commons, EPrints or DSpace. It can also be used directly with an SQL database or just a folder of XML files. Interaction with other systems and websites Feeds from MOAI can be picked up by any system or search engine that understands OAI metadata. If the system is a content management system and has harvesting capabilities, the feed data can be stored, presented, and searched within a website. Silva, a powerful CMS for organizations that manage complex sites, has OAI Pack extensions that provide these capabilities.


Download ppt "WP5 – Platform - Objective To design, build, establish, run, and sustain the Platform, and take primary responsibility for dissemination of the project."

Similar presentations


Ads by Google