Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generalized Vector Space Model Definition Let k i be a vector associated with the index term k i. Independence of index terms in the vector model implies.

Similar presentations


Presentation on theme: "Generalized Vector Space Model Definition Let k i be a vector associated with the index term k i. Independence of index terms in the vector model implies."— Presentation transcript:

1 Generalized Vector Space Model Definition Let k i be a vector associated with the index term k i. Independence of index terms in the vector model implies that the set of vectors {k 1,k 2,…,k t } is linearly independent and forms a basis for the subspace of interest. The dimension of this space is the number t of index terms in the collection.

2 An example for independent V 1 =(1, 0, 0), V 2 =(0, 1, 0), V 3 =(0, 0, 1). V 1  V 2 =0+0+0=0. V i  V j =0. Each element represents a keywords. Different keywords are treated as totally different items. This is not reasonable since sometimes they are related.

3 Definition Given the set {k 1,k 2,…,k t } of index terms in a collection, as before, let w i,j be the weight associated with the term-document pair [k i,d j ]. If the w i,j weights are all binary, then all possible patterns of term co-occurrence (inside documents) can be represented by a set of 2 t minterms given by m 1 =(0,0,…,0), m 2 =(1,0,…,0),…, m 2 t =(1,1,…,1). Let g i (m j ) return the weight {0,1} of the index term k i in the minterm m i.

4 Definition Let us define the following set of vectors m 1 =(0, 0, …, 1) m 2 =(0, 0, …, 1, 0) ….. m 2 t -1 =(1, 1, …, 1). where each vector m i is associated with the respective minterm m i. for all

5 1.1 1.2

6

7 An example for Generalized Vector Space Model Suppose that the system has 12 documents and 4 keywords. D1=(2, 1, 0, 0), D2=(5, 1, 0, 0), D3=(1, 1, 1, 1), D4=(0, 0, 2, 2), D5=(0, 1, 1, 2), D6=(0, 0, 1, 1), D7=(0, 0, 1, 0), D8=(1, 1, 0, 0), D9=(2, 1, 1, 1), D10=(0, 2, 2, 2). D11=(1, 0, 2, 0), D12=(0,0, 2,1). Minterms: 6 minterms are used as independent vectors to form a base. m1=(1, 1, 0, 0), m2=(1, 1, 1, 1), m3=(0, 0, 1, 1), m4=(0, 1, 1, 1), m5=(0, 0,1, 0), m6=(1, 0, 1, 0).

8 Generalized Vector Space Model Independent vectors: v1= (1, 0, 0, 0, 0, 0), v2=(0, 1, 0, 0, 0, 0), v3=(0, 0, 1, 0, 0, 0), v4=(0, 0, 0, 1, 0, 0), v5=(0, 0, 0, 0, 1, 0), v6=(0, 0, 0, 0, 0, 1). V i represents minterm m i. Each pair of V i and V j is orthogonal. (dot product=0) The four keywords k1, k2, k3, and k4 are represent by a combination of the independent vectors.

9 Generalized Vector Space Model The four keywords k1, k2, k3, and k4 are represent by a combination of the independent vectors. k1=(c 1,1 V1+c 1,2 V2+c 1,3 V3+c 1,4 V4+c 1,5 V5+c 1,6 V6)/C where c 1,1 =w 1,1 +w 1,2 +w 1,8 =2+5+1 (D1, D2, and D8 has minterm m1), c 1,2 =w 1,3 +w 1,9 =1+2=3(D3 and D9 has minterm m2), c 1,3 =w 1,4 +w 1,6 +w 1,12 =0+0+0=0 (D4, D6 and D12 has minterm m3.), c1,4=w 1,5 +w 1,10 =0+0. c1,5=w1,7=0. c 1,6 =w 1,11 =1. C=(c 1,1 2 +c 1,2 2 +c 1,3 2 +c 1,4 2 +c 1,5 2 +c 1,6 2 ) 0.5

10 Generalized Vector Space Model k2=(c 2,1 V1+c 2,2 V2+c 2,3 V3+c 2,4 V4+c 2,5 V5+c 2,6 V6)/C where c 2,1 =w 2,1 +w 2,2 +w 2,8 =1+1++1 (D1, D2, and D8 has minterm m1), c 2,2 =w 2,3 +w 2,9 =1+1=2(D3 and D9 has minterm m2), c 2,3 =w 2,4 +w 2,6 +w 2,12 =0+0+0=0 (D4, D6 and D12 has minterm m3.), c 2,4 =w 2,5 +w 2,10 =1+2=3. c 2,5 =w 2,7 =0. c 2,6 =w 2,11 =0. C=(c 2,1 2 +c 2,2 2 +c 2,3 2 +c 2,4 2 +c 2,5 2 +c 2,6 2 ) 0.5

11 Generalized Vector Space Model k3=(c 3,1 V1+c 3,2 V2+c 3,3 V3+c 3,4 V4+c 3,5 V5+c 3,6 V6)/C where c 3,1 =w 3,1 +w 3,2 +w 3,8 =0 (D1, D2, and D8 has minterm m1), c 3,2 =w 3,3 +w 3,9 =1+1=2(D3 and D9 has minterm m2), c 3,3 =w 3,4 +w 3,6 +w 2,12 =2+1+2=5 (D4, D6 and D12 has minterm m3.), c 3,4 =w 3,5 +w 3,10 =1+2=3. c 3,5 =w 3,7 =1. c 3,6 =w 3,11 =2. C=(c 3,1 2 +c 3,2 2 +c 3,3 2 +c 3,4 2 +c 3,5 2 +c 3,6 2 ) 0.5

12 Generalized Vector Space Model k4=(c 4,1 V1+c 4,2 V2+c 4,3 V3+c 4,4 V4+c 4,5 V5+c 4,6 V6)/C where c 4,1 =w 4,1 +w 4,2 +w 4,8 =0 (D1, D2, and D8 has minterm m1), c 4,2 =w 4,3 +w 4,9 =1+1=2(D3 and D9 has minterm m2), c 4,3 =w 4,4 +w 4,6 +w 4,12 =2+1+1=4 (D4, D6 and D12 has minterm m3.), c 4,4 =w 4,5 +w 4,10 =2+2=4. c 4,5 =w 4,7 =0. c 4,6 =w 4,11 =0. C=(c 4,1 2 +c 4,2 2 +c 4,3 2 +c 4,4 2 +c 4,5 2 +c 4,6 2 ) 0.5 Ki’s are converted from a vector of length 4 into a vector of length 6.

13 Google Web API See: http://www.google.com/apis/

14 Concept: With the Google Web APIs service, software developers can query more than 3 billion web documents directly from their own computer programs. Google uses the SOAP and WSDL standards so a developer can program in his or her favorite environment - such as Java, Perl, or Visual Studio.NET.

15 Google Web APIs provide three service: Search relative web pages according to the keyword(s) user supplies Return the cached web page to the user by the URL user supplies Correct the spell of the word user inputs

16 Search Requests: Search requests submit a query string and a set of parameters to the Google Web APIs service and receive in return a set of search results. Search results are derived from Google’s index of over 2 billion Web pages.

17 Seach Request Format: NameDescription KeyProvided by Google, Google uses the key for authentication and logging QQuery string startZero-based index of the first desired result maxRes ults Number of results desired per query. The maximum value per query is 10. (see next page)

18 filterActivates or deactivates automatic results filtering, which hides very similar results and results that all come from the same Web host. restrictRestricts the search to a subset of the Google Web index, such as a topic like “Linux”. safeSe arch A Boolean value which enables filtering of adult content in the search results. lrLanguage Restrict-Restricts the search to documents within one or more languages.

19 Search Results Format: Search Response----Each time you issue a search request to the Google service, a response is returned to you. (We will describe the meanings of the values returned to you.) Result Element

20 Search Response: --A Boolean value indicating whether filtering was performed on the search results --A text string intended for displaying to an end user --The estimated total number of results that exist for the query

21 Continue: --A Boolean value indicating that the estimate value is actually the exact value --An array of items. This corresponds to the actual list of search results --This is the value of for the search request

22 Continue: --Indicates the index (1-based) of the first search result in --Indicates the index(1-based) of the last search result in --A text string intended for displaying to the end user. It provides instructive suggestions on how to use Google

23 Continue: --An array of items --Text, floating-point number indicating the total server time to return the search results, measured in seconds

24 Cache Requests: Cache requests submit a URL to the Google Web APIs service and receive in return the contents of the URL when Google’s crawlers last visited the page.

25 Spelling Requests: Spelling requests submit a query to the Google Web APIs service and receive in return a suggested spell correction for the query (if available).

26 Java Implementation: Google provides a java implementation of the Google Web APIs We will take a look at it and provide an example finally.

27 The java classes: com.google.soap.search.GoogleSearch com.google.soap.search.GoogleSearchRe sult com.google.soap.search.GoogleSearchRe sultElement com.google.soap.search.GoogleSearchFa ult com.google.soap.search.GoogleSearchDir ectoryCategory

28 Usage Demo: GoogleSearch s = new GoogleSearch(); s.setKey(clientKey); try { if (directive.equalsIgnoreCase("search")) { s.setQueryString(directiveArg); GoogleSearchResult r = s.doSearch(); System.out.println(r.toString()); } else if (directive.equalsIgnoreCase("cached")) { byte [] cachedBytes = s.doGetCachedPage(directiveArg); String cachedString = new String(cachedBytes); System.out.println(cachedString); } else if (directive.equalsIgnoreCase("spell")) { System.out.println("Spelling suggestion:"); String suggestion = s.doSpellingSuggestion(directiveArg); System.out.println(suggestion); }

29 How to build the executive file 1. Write your own code in the right place of the GoogleAPIDemo.java; 2. Compile GoogleAPIDemo.java; 3. Add the GoogleAPIDemo$1.class and GoogleAPIDemo.class (both generated by 2) in the directory “com.google.soap.search” of GoogleAPI.jar with the software WinRAR. 4. Click the exec.bat to run the program.

30 Example program: You can download the executive files and source files of the example from Dr. Wang’s home page.


Download ppt "Generalized Vector Space Model Definition Let k i be a vector associated with the index term k i. Independence of index terms in the vector model implies."

Similar presentations


Ads by Google