Presentation is loading. Please wait.

Presentation is loading. Please wait.

全民健康保險研究資料庫論文 產出分析、研究方向及使用示範

Similar presentations


Presentation on theme: "全民健康保險研究資料庫論文 產出分析、研究方向及使用示範"— Presentation transcript:

1 全民健康保險研究資料庫論文 產出分析、研究方向及使用示範
陳曾基 國立陽明大學醫學院醫務管理研究所 台北榮民總醫院家庭醫學部

2 今天在北醫權充一天和尚 Der Prophet gilt nichts im eigenen Land
今天在北醫權充一天和尚 Der Prophet gilt nichts im eigenen Land. Nullus propheta in patria. An ass in Germany is a professor in Rome.

3 -

4

5 Materials: Extract data from PubMed
("insurance, health"[MeSH Terms] OR "national health programs"[MeSH Terms] OR health insurance[TW] OR national health[TW] OR national insurance[TW] OR claims data*[TW] OR claim data*[TW] OR insurance claim*[TW] OR insurance data*[TW] OR administrative data*[TW] OR nationwide data*[TW] OR national data*[TW] OR NHIRD[TW] OR NHI[TW] OR BNHI[TW] OR population based[TW] OR population*[ti] OR nationwide[ti] ) AND taiwan[All Fields] AND English[lang] AND 1996:2009[dp] 2010 * Accuracy not guaranteed !!! Courtesy of Yu-Chun Chen

6 Materials: Review of NHIRD Papers
383 articles are included Courtesy of Yu-Chun Chen

7 NHIRD Papers Grows Exponentially
Courtesy of Yu-Chun Chen

8 NHIRD Papers Increase In Both Quantity and Quality
Courtesy of Yu-Chun Chen

9 Cumulative Number of Papers Using NHIRD, 2000-2009
Publish Year Cumulative no. of NHIRD studies Cumulative no. of NHIRD studies indexed in JCR2008 Cumulative no. of authors Cumulative no. of study fields Cumulative no. of journals publishing papers 2000 1 8 2001 2 13 2002 9 6 36 7 2003 27 18 83 31 22 2004 59 46 154 61 41 2005 85 69 219 84 56 2006 140 124 310 119 88 2007 183 166 384 111 2008 276 251 510 202 159 2009 383 353 667 250 210 Average 5-year annual growth rate (%)a 45.8 51.1 34.2 33.0 39.0 Doubling timeb (year) 1.80 1.73 2.22 2.21 2.01 a Annual growth rate=(no. of studies in current year – no. of studies in previous year) / no. of studies in previous year b Doubling time is estimated by fitted exponential model

10 Distribution of Study Topics
Top 10 subjects in MeSH N = 59 N = 329 N =3 83 Subject category MeSH No. % Rank [H02] Health Occupations 37 62.7 1 160 48.6 197 51.4 [E02] Therapeutics 4 6.8 8 43 13.1 2 47 12.3 [N03] Health Care Economics and Organizations 6 10.2 39 11.9 45 11.7 3 [N05] Health Care Quality, Access, and Evaluation 5.1 10 41 12.5 44 11.5 [N02] Health Care Facilities, Manpower, and Services 16.9 31 9.4 7 10.7 5 [H01] Natural Science Disciplines 35 10.6 [N04] Health Services Administration 8.5 34 10.3 [F04] Behavioral Disciplines and Activities 26 7.9 32 8.4 [I01] Social Sciences 24 7.3 8.1 9 [N06] Environment and Public Health 30 7.8 Courtesy of Yu-Chun Chen

11 Average IF in SCI Fields
SCI category No. of article Average IF HEALTH CARE SCIENCES & SERVICES 51 1.798 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH 47 2.625 PSYCHIATRY 41 3.272 HEALTH POLICY & SERVICES 39 1.868 PHARMACOLOGY & PHARMACY 34 2.740 MEDICINE, GENERAL & INTERNAL 33 1.541 CLINICAL NEUROLOGY 3.153 OBSTETRICS & GYNECOLOGY 21 2.202 SURGERY 17 2.349 PEDIATRICS 15 3.216 CARDIAC & CARDIOVASCULAR SYSTEMS 14 2.463 ENDOCRINOLOGY & METABOLISM 13 4.671 GASTROENTEROLOGY & HEPATOLOGY 3.669 OPHTHALMOLOGY 12 2.791 IMMUNOLOGY 3.804 NEUROSCIENCES 11 2.153 RESPIRATORY SYSTEM 3.148 PERIPHERAL VASCULAR DISEASE 10 5.532 MEDICINE, RESEARCH & EXPERIMENTAL 2.366 Courtesy of Yu-Chun Chen

12 Productivity of authors: reproducible success?
No. of articles per author No. of authors Cum. % to all authors (%) Cumulative contribution to articles Cumulative percentage to all research (%) >20 8 1.2 175 45.7 18 3.9 231 60.3 2 - 9 245 40.6 365 95.3 1 396 100.0 383 Author # of article Lin HC 99 Lee HC 36 Chou YJ 30 Lee CH 29 Chen TJ 25 Chou P 24 Hwang SJ 21 Xirasagar S 20 Author # of article Chen CS 19 Chen YH 18 Huang N Chou LF 16 Liu TC Chang HJ 15 Lin CH Chen YC 14 Tang CH Huang WF 11 Wang JD Yang CY Courtesy of Yu-Chun Chen

13 Status: 6 Apr 2011

14 Distribution of NHIRD Papers by Journal Impact Factor and Year
IF (2008) 2003 2004 2005 2006 2007 2008 2009 (n = 9) (n = 18) (n = 32) (n = 26) (n = 55) (n = 43) (n = 93) (n=107) >= 10 3 [5-10) 2 1 5 7 14 [3-5) 6 13 20 28 [1-3) 17 12 32 27 50 49 < 1 4 8 NA

15 Social Network Analysis as a Tool to Visualize Flow of Information Analysis of studies using health databases 德國海德堡大學醫療資訊研究所 陳 育 群 Sep 25, 2010

16 Collaboration Network: 2000-2009
Chen YC et. al. Scientometrics (2010) Taiwan’s NHIRD: administrative health care database as study object in bibliometrics

17 Collaboration network: 2000

18 Collaboration network: 2001

19 Collaboration network: 2002

20 Collaboration network: 2003

21 Collaboration network: 2004

22 Collaboration network: 2005

23 Collaboration network: 2006

24 Collaboration network: 2007

25 Collaboration network: 2008

26 Collaboration network: 2009

27 Collaboration network: 2009 (label)

28 Design of Claims-Based Studies
Observational studies Descriptive studies Analytic studies - Ecological - Cross-sectional - Case-control - Cohort 資料處理 單純 複雜 低臨床 相關 高臨床

29 Types of Study Designs / Computation
Simple description Association / Relationship Complex computation (Data mining)

30 Simple Description Disease Drug Person Specialty Sector
Epidemiologic features of Kawasaki disease in Taiwan, A nationwide survey on epidemiological characteristics of childhood Henoch-Schönlein purpura in Taiwan. Prevalence and risks of chronic airway obstruction: a population cohort study in Taiwan. Drug Utilization of hepatoprotectants within the National Health Insurance in Taiwan. Demographics and patterns of acupuncture use in the Chinese population: the Taiwan experience. Person Risks and causes of hospitalizations among physicians in Taiwan Specialty Use frequency of traditional Chinese medicine in Taiwan. Sector Patterns of ambulatory care utilization in Taiwan.

31 Association / Relationship
A  B (no temporal consideration) Association between physician volume and hospitalization costs for patients with stroke in Taiwan: a nationwide population-based study. A  B (temporal change) Seasonal variations in urinary calculi attacks and the association with climate: a population-based study. A => B (temporal sequence) Risk of extrapyramidal syndrome in schizophrenic patients treated with antipsychotics: a population-based study. Sudden sensorineural hearing loss increases the risk of stroke: A 5-year follow-up study Does elective caesarean section increase utilization of postpartum maternal medical care? * Control Group

32 Complex Computation Association rule mining Frequent itemset mining
Application of a data-mining technique to analyze coprescription patterns for antacids in Taiwan. Frequent itemset mining The prescriptions frequencies and patterns of Chinese herbal medicine for allergic rhinitis in Taiwan.

33 Simple Description Disease Drug Specialty Sector
Pediatrics : IF 4.789, Ranking 2 / 86 (Pediatrics) Disease Epidemiologic features of Kawasaki disease in Taiwan, A nationwide survey on epidemiological characteristics of childhood Henoch-Schonlein purpura in Taiwan. Prevalence and risks of chronic airway obstruction: a population cohort study in Taiwan. Drug Utilization of hepatoprotectants within the National Health Insurance in Taiwan. Demographics and patterns of acupuncture use in the Chinese population: the Taiwan experience. Specialty Use frequency of traditional Chinese medicine in Taiwan. Sector Patterns of ambulatory care utilization in Taiwan. Rheumatology : IF 4.136, Ranking 7 / 22 (Rheumatology) Chest : IF 5.154, Ranking 4 / 40 (Respiratory system)

34 Association / Relationship
Stroke : IF 6.499, Ranking 6 / 156 (Clinical Neurology) A  B (no temporal consideration) Association between physician volume and hospitalization costs for patients with stroke in Taiwan: a nationwide population-based study. A  B (temporal change) Seasonal variations in urinary calculi attacks and the association with climate: a population-based study. A => B (temporal sequence) Risk of extrapyramidal syndrome in schizophrenic patients treated with antipsychotics: a population-based study. Sudden sensorineural hearing loss increases the risk of stroke: A 5-year follow-up study Does elective caesarean section increase utilization of postpartum maternal medical care? J Urology : IF 3.952, Ranking 9 / 57 (Urology …) Clin Pharmacol Ther : IF 7.586, Ranking 9 / 219 (Pharmacology …) Stroke : IF 6.499, Ranking 6 / 156 (Clinical Neurology) Med Care : IF 3.194, Ranking 5 / 62 (Health Care Sciences ...)

35 Allergy : IF 6.204, Ranking 2 / 17 (Allergy)
Complex Computation Association rule mining Application of a data-mining technique to analyze coprescription patterns for antacids in Taiwan. Frequent itemset mining The prescriptions frequencies and patterns of Chinese herbal medicine for allergic rhinitis in Taiwan. Allergy : IF 6.204, Ranking 2 / 17 (Allergy)

36 使用示範 (陳育群醫師製作)

37 NHIRD Datasets

38 健保資料庫長得這樣…

39 變成這樣… 看起來 AOM, COM 春季較多!

40 40

41 File Structures of Datasets to Process
Single file Multiple files: Of the same format Of similar formats in different years Of different formats, but connected through Primary key / foreign key Loop-up table

42

43 Main Tasks 1.資料轉換 READ 2.變數處理 3.結果分析 SELECT, FILTER APPEND, UNION
JOIN, SORT 3.結果分析 AGGREGATE OUTPUT STATISTICS

44 程式語言 統計套裝軟體 資料管理系統

45 1.資料轉換 2.變數處理 3.結果分析

46 資料轉換 由資料來源讀取出資料,將它們轉換成適合分析的型態,並且將它們匯入資料庫內。
1.資料轉換 READ 由資料來源讀取出資料,將它們轉換成適合分析的型態,並且將它們匯入資料庫內。 通常還要搭配著資料清潔 (Data Cleaning) 將系統源頭許多未經整合的、不允許的、遺失的或者錯誤的資料,在匯入之前重新整頓(Garbage In, Garbage Out)

47 變數處理 把分別儲存於不同表格的原始資訊,如醫院層級、藥物分類、疾病分類、重大傷病等等「串連」許多個資料表 資料加值 2.變數處理
SELECT, FILTER APPEND, UNION JOIN, SORT 把分別儲存於不同表格的原始資訊,如醫院層級、藥物分類、疾病分類、重大傷病等等「串連」許多個資料表 資料加值

48 以SAS為例

49 實際上的例子 (1) 已知某個族群 (ID_Cohort),想了解這個族群在2000年到2006年的就診情形(醫院層級、醫院所在地點、看診日期)

50 實際上的例子 (2) 已知某個族群 (ID_Cohort), 排除掉(exclusion criteria),想了解這個族群在2000年到2006年內第一次就診日期

51 實際上的例子 (3) 已知某個族群 (ID_Cohort),扣掉不符合的案例 (exclusion criteria) ,想了解這個族群在2000年到2006年內最後一次就診日期及就醫地點(醫院層級、醫院地點)、治療時期及這期間使用藥物。 有點複雜。

52

53 SQL 語法 SQL (Structured Query Language)
1970 由 IBM 發表,用於大型資料庫中 資料的定義/操作/查詢/控制 適合資料串連,幾乎遍及所有資料庫系統(MS SQL Server, IBM DB2, Oracle, MySQL…等等) SAS 6.0 以後加入專門資料操作用模組 (Proc SQL)

54 What’s COOL in SQL ? SELECT, FILTER, APPEND, UNION, SORT, JOIN
SQL is designed for MULTIPLE RELATION tables JOIN MERGE (in SAS) is a special case of JOIN (equal join) Reads like English

55 實際上的例子 (1) 已知某個族群 (ID_Cohort),想了解這個族群在2000年到2006年的就診情形(醫院層級、醫院所在地點、看診日期)

56 Q: 想了解某族群 (ID_Cohort) 的就診情形(醫院層級、醫院所在地點、看診日期)
SELECT hosp_cont_type, area_no, func_date FROM cd JOIN ID_Cohort ON cd.id = ID_Cohort.id JOIN HOSB2006 ON cd.hosp_id = HOSB2006.hosp_id

57 實際上的例子 (2) 已知某個族群 (ID_Cohort), 排除掉(exclusion criteria),想了解這個族群在2000年到2006年內第一次就診日期

58 Q: 想了解某族群 (ID_Cohort) 排除掉 (exclusion criteria) 想了解第一次就診日期
SELECT id, min(func_date) as FirstVisit FROM cd JOIN ID_Cohort ON cd.id = ID_Cohort.id WHERE id NOT IN ( SELECT id FROM excludeCriteria ) GROUP BY id

59 實際上的例子 (3) 已知某個族群 (ID_Cohort),扣掉不符合的案例 (exclusion criteria) ,想了解這個族群在2000年到2006年內最後一次就診日期及就醫地點(醫院層級、醫院地點)、治療時期及這期間使用藥物。 有點複雜。

60 Q: 某個族群 (ID_Cohort),扣掉不符合的案例 (exclusion criteria) ,想了解
最後一次就診日期及就醫地點(醫院層級、醫院地點)、治療時期及這期間使用藥物 WITH tmpVisit AS ( SELECT id, min(func_date) as FirstVisit, max(func_date) as LastVisit FROM cd JOIN ID_Cohort ON cd.id = ID_Cohort.id WHERE id NOT IN ( SELECT id FROM excludeCriteria ) GROUP BY id ) SELECT id, DATEDIFF(month, FirstVisit, LastVisit) as Duration FROM tmpVisit ……

61 SQL Also Works in SAS Q: 想了解某族群 (ID_Cohort) 的就診情形(醫院層級、醫院所在地點、看診日期) SELECT hosp_cont_type, area_no, func_date FROM cd JOIN ID_Cohort ON cd.id = id_cohort.id JOIN HOSB2006 ON cd.hosp_id = HOSB2006.hosp_id

62 SQL Also Works in SAS PROC SQL;
Q: 想了解某族群 (ID_Cohort) 的就診情形(醫院層級、醫院所在地點、看診日期) SELECT hosp_cont_type, area_no, func_date FROM cd JOIN ID_Cohort ON cd.id = id_cohort.id JOIN HOSB2006 ON cd.hosp_id = HOSB2006.hosp_id PROC SQL; ; QUIT;

63 Example: Prevalence analysis
PATTERNS OF TRADITIONAL CHINESE MEDICINE (TCM) USE IN PATIENTS WITH INFLAMMATORY BOWEL DISEASE (IBD): A POPULATION STUDY IN TAIWAN Yu-Chun Chen, Fang-Pey Chen, Tzeng-Ji Chen, Li-Fang Chou, Shinn-Jang Hwang. Hepato-Gastroenterology 2008;55: [SCI]

64 Research Objective Inflammatory bowel disease (IBD) 在台灣地區的盛行率?
使用資料庫: 承保資料檔, 重大傷病檔, 中醫門診處方及治療明細檔

65 國家衛生研究院 全民健保資料庫 中醫門診處方及治療明細檔(CM_CD檔) 年, 共 228 個檔案, 82 GB

66 中醫門診處方治療明細檔部分內容 2005 年 12 月的一個檔案: 444 MB 1,550,000 筆資料

67 Example: IBD 在台灣的盛行率 資料轉換 變數處理 結果分析

68 Data Processing with SQL Server 2005

69 SQL Server: Task 1. 資料轉換 將健保資料檔轉換成SQL資料檔, 並儲存在IBD_DATA 資料檔中
bulk insert IBD_DATA..HV 匯入 from ‘xxxxx.dat’ 從 with 選項 ( batchsize = , formatfile = '檔案格式.fmt' )

70 SQL Server: Task 2. 變數處理 IBD 病患: 挑取重大傷病代碼為 555.x 或 556.x (當作分子)
SELECT id 選取 FROM HV 從 HV WHERE 條件 LEFT(ACODE_ICD, 3) = '555' OR LEFT(ACODE_ICD, 3) = '556'

71 SQL Server: Task 2. 變數處理 Population: 挑取2004年所有的保險人 (當作分母)
SELECT Pop.*, IBDpt.ID 選取 FROM Pop 從 POP LEFT OUTER JOIN IBDpt 串聯 ON Pop.id = IBDpt.id 依照

72 SQL Server: Task 3.結果輸出 分別計算性別、年齡層的盛行率
SELECT sex, age, count(*) 選取 FROM JoinTABLE 從JoinTABLE GROUP BY sex, age 合併計算

73 SQL Server: Task 3.結果輸出 分別計算性別、年齡層的盛行率

74 Results Prevalence of IBD in Taiwan is 5.6 per 100,000; Male > Female Women were more likely to use TCM than men (40.5% vs. 34.3%). 45.5% patients had GI diagnoses at their TCM visits. Most of their TCM visits contained herbal remedies (90%).

75 I have a dream …

76

77 The NHIRD research will enable the current generation of medical professionals in Taiwan to know the Amis better than the Amish. 健保資料庫研究 讓 台灣新世代醫事人員 瞭解 台東 多於 美東 * Amis : 阿美族

78 Some suggestions

79 How can we start to do? Become familiar with the NHIRD codebooks and NHI regulations Think of research problems Read relevant literature Discuss with colleagues Find friends familiar with data processing Motivation and courage Tolerance and endurance 79

80 Paper Production Flow Idea Idea Method Writing Journal English Tools
Atmosphere Paper Production Flow Idea Idea Method Writing Journal English Materials Computing Statistics Tools Teams Infrastructure

81

82 Advertisement

83 Open-source P-Q-R Solutions to NHIRD Data Management
Coming !

84 Thanks for Your Attention !


Download ppt "全民健康保險研究資料庫論文 產出分析、研究方向及使用示範"

Similar presentations


Ads by Google