Presentation on theme: "Search for personal information using Yahoo BOSS by Evgeny Dosychev Dmitry Kichin Supervisor: Eddie Bortnikov."— Presentation transcript:
Search for personal information using Yahoo BOSS by Evgeny Dosychev Dmitry Kichin Supervisor: Eddie Bortnikov
BOSS -Yahoo! Search BOSS (Build your Own Search Service) is a Yahoo! initiative that gives the developers free access to the Yahoo! Search index. -The results can be supplied into the developer's application so that they can manipulate the resources according to their needs. -Up to 500 results can be retrieved. Based on Wikipedia
HomePage - This JAVA desktop application will automatically create HTML page, which looks like a personal web homepage. - The information will be collected from the web using general purpose search engine. - We focused on creating pages for researchers and academic staff. - The personal details are retrieved from publications and scientific papers.
HomePage functionality - Gets from the user the search target name. - Searches the web using Yahoo! BOSS. - Downloads and parses PDF doc’s and images. - Divides the information to clusters. - Gets the user choice to take the related clusters. - Produces HTML page with all the details.
Clustering algorithm - It is very hard to solve name ambiguity automatically. We leave this task to the user. - Each information item will be defined by its key (currently: the email of the document it appears in). ”Cluster” is a combination of all the information items with the same key. - The user chooses the clusters which seem to be related to the person. The result page will be produced from the chosen clusters
Conclusions -We learned the principle of the BOSS project, and used the power that it provides -Perhaps the main challenge was the semantic parsing (finding information in the text). Sematic parsing by itself requires time and resourses. -We prepared a well-designed object oriented infrastructure for the task. It can be a good base for adding more algorithms that find additional information in the texts.