Download presentation
Presentation is loading. Please wait.
1
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Dayi Zhou Week 4 (Oct. 19)
2
2 Questions from Homework 2 About the algorithm: – Can you go over exactly how the DCM picks possible positive and negative correlations between schemas? – Amanda – I would like to know more about what the “sparseness problem” and the “rare attribute problem”. – Paea – I think the matching selection is interesting and their algorithm is quite simple but seems to work. I wonder, though, if it would work quite as well in other applications besides Web- interfaces? It seems web interfaces do like real schemas which have more complex matching/mappings. But maybe with a user interface, combined with their heuristics, some very good results can be found? – Paea – What can be done with smaller data sets to improve the likelihood that truly semantically related matchings will result from choosing m n (M 1 ) > m n (M 2 )
3
3 Questions from Homework 2 (cont ’ d) About the data mining approach: – I don’t fully understand the data preparation part. Do the authors take form fields as attribute names and the query results as field data (i.e. domain data)? If yes, how can the authors explore all existing data – do they try all combinations in the web forms? -- Zebin – Another possibility is to take form fields as attribute names and the database data as domain data: but this is then dubious, why not directly study database schema matching then? Database schema should be closer to domain semantic. One justification for form extraction is that database schema is hard to achieve in practice (mostly hidden), but then the authors should explain how they achieve domain data through web forms, and why such data correspond to domain data. Subsequently, they can’t use database data to evaluate their matching algorithms (unless they assume a “perfect” data preparation algorithm). Moreover, apparently, some form data could be a view that is generated from several other fields, and the authors don’t mention such correlation. Furthermore, I am slightly dubious about the authors’ assumption that most data coexist in a page can’t be synonym. Finally, web form extraction might not be that direct, e.g. if we use multimedia data to represent the query result, how will the extraction tools tell the domain value? -- Zebin
4
4 Questions from Homework 2 (cont ’ d) About data mining approach: – Why not keep attribute groups that do not have a synonym, and ask the user to provide likely synonyms for future matching candidates? -- Shiwoong – Why the number of possible m:n matchings is exponential? Can we improve this algorithm-- DH – The threshold of positive and negative correlation measures are set manually, which is not so robust. Is there any way to apply some learner algorithm to DCM? -- Jiawei – What ’ s the challenge to match multi-domain schemas? -- Jiawei
5
5 Questions from Homework 2 (cont ’ d) Other questions about DCM: – Is there anything that uses DCM on the web to produce Dogpile-ish results? For example, something that could match Cuircut City's, BestBuy's, and Fry's' future deep-Web online product catalog, finding a certain user a specified product that may be at several different stores to bid for the best qualified product or lowest price. -- Amanda – Can the authors get better result with other methods than H- measure. Perhaps confidence measure? – Won’t the name-based matchings fall prey to homonyms?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.