Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

Similar presentations


Presentation on theme: "Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan"— Presentation transcript:

1 Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Identification of Conclusive Association Entities by Biomedical Association Mining Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

2 Outline Background The proposed technique: AMICAE Empirical evaluation
Conclusion

3 Background

4 Conclusive Association Entities (CAEs)
Biomedical entities Example: Genes, diseases, and chemicals Conclusive Association Entities (CAEs) in a scholarly article a Those biomedical entities that are specific targets on which conclusive findings about their associations are reported in a

5 Dopamine is not a specific target  It is not a CAE
Results on TDHL are not conclusive enough  It is not a CAE

6 Even an entity appears in the title, it is not necessarily a CAE (not a specific target)

7 Why Identification of CAEs?
Goal: To support analysis of conclusive findings on specific entities A task routine mission of biomedical scientists CTD (Comparative Toxicogenomics Database) GHR (Genetic Home Reference) OMIM (Online Mendelian Inheritance in Human) Challenge: It is quite difficult to Identify specific entities, and Estimate how conclusive the findings on entities are

8 Our Goal Developing a technique AMICAE (Association Mining for Identification of CAEs) Given: Title and abstract of a scholarly article a Output: An indicator to improve CAE identification by association mining

9 Related Work (1/4) Article indexing by biomedical terms Goal
A kind of text classification task Example: Indexing of articles by MeSH (Medical Subject Heading) But we have a different goal Prioritizing CAEs that appear in an article Rather than classifying the article certain categories

10 Related Work (2/4) Prediction of new associations by existing associations Goal: Predicting new possible associations that deserve further analysis But we have a different goal Identification of CAEs that have already been published in articles

11 Related Work (3/4) Extraction of biomedical entity associations Goal:
Extracting (recognizing) biomedical associations mentioned in articles But an entity in an association may not be a CAE The association may not be conclusive enough The association may not always exist, or The association may be only related to the background of the article

12 Related Work (4/4) Estimation of entity-article relatedness
Typical techniques Ontology and statistical indicators that work on full-text articles We propose to improve them by association mining Association mining to refine entity-article relatedness estimation Applicable even if full texts are not available

13 The Proposed Technique: AMICAE

14 Main Ideas potential associations (identified from a set of articles)
inferred associations Given an article a, if e1 and e3 are candidate entities in a, they are likely to be CAEs of a

15 Step 1/4 Given a candidate entity e in the title and the abstract of an article a, estimate its strength of being a potential CAE of a Top-2 entities in each article are potential CAEs of the article

16 Step 2/4 Given a set of articles, construct a network of potential associations based on the potential CAEs, and accordingly produce inferred (indirect) associations Se1,e2 = Number of articles having <e1, e2> as a potential association

17 Step 3/4 Estimate CorStrengthe,a (correlation strength between entity e and article a, based on association mining)

18 Step 4/4 Integrate CorStrength and other indicators by RankingSVM
Various entity-article relatedness indicators are tested RankingSVM as a learning-based method to integrate CorStrength and these indicators We will see how CorStrength improves these indicators in CAE identification

19 Empirical Evaluation

20 The data Source of data: CTD (Comparative Toxicogenomics Database)
An online database of associations between three types of entities (chemicals, genes, and diseases) Many biomedical scientists are recruited to frequently update the entity associations Only conclusive associations are curated About 50% of all articles in CTD (60,507 articles) with their CAEs appearing in their titles or abstracts

21 The Baselines Typical indicators to estimate relatedness between an entity e and an article a: Se(a) = Set of sentences (in a) mentioning e Sex(a) = Set of sentences where e and x co-occur

22 Evaluation Criteria MAP (Mean Average Precision)
If CAEs of an article are ranked high, average precision (AP) for the article will be higher MAP is simply the average of the AP values for all articles

23 Average If most CAEs of an article are ranked at top-X positions, for the article will be high Average is simply the average of the values for all articles

24 Result When CorStrength is added (i.e., there are six indicators integrated by RankingSVM), the performance is further improved significantly

25 When CorStrength is added (i. e
When CorStrength is added (i.e., there are six indicators integrated by RankingSVM), larger percentage of test articles (over 95%) can have their CAEs ranked at top positions (top-1 to top-3)

26 Conclusion

27 Identification of CAEs in the title and the abstract of an article is challenging
It is difficult to identify those specific entities on which research findings reported in the article is conclusive enough We develop AMICAE that Provides helpful information to improve CAE identification by association mining Two candidate entities in an article are likely to be CAEs of the article if a strong association between them is mined from a collection of articles


Download ppt "Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan"

Similar presentations


Ads by Google