Presentation on theme: "Erudition 愛如生 Databases Yang Jidong Univ. of Michigan."— Presentation transcript:
Erudition 愛如生 Databases Yang Jidong Univ. of Michigan
Databases of pre-modern Chinese texts: an overview Scripta Sinica ( 漢籍電子文獻 ), Academia Sinica, Taipei. PROS: Earliest full-text database of pre- modern Chinese works. Based on modern punctuated and annotated versions. High accuracy. Acceptable price ($2,500/yr). CONS: Early contents based on Big5 encoding, many missing characters. Uneven qualities of digitized texts. Lack of strong financial and technological support for future development.
Databases of pre-modern Chinese texts: an overview (cont’d) Guoxue baodian ( 國學寶典 ), Guoxue Company, Beijing. PROS: Simplified characters (could be a con to some users). Most texts punctuated. Available in various media (Web, flash drive, CD- ROM, etc.). CONS: Relatively low accuracy. Early contents based on GB encoding with many missing characters. Some contents may have copy-right issues (challenged by various Chinese publishers in court).
Databases of pre-modern Chinese texts: an overview (cont’d) Unihan databases ( 書同文數據庫 ), including 四 部備要, 四部叢刊, 石刻文獻, 明清實錄、會典, 中 醫古文獻, etc., Unihan Company, Beijing. PROS: High accuracy. Using Unicode from the beginning. Reliable and fast access. Relatively low prices (one-time charges only for most databases, no annual fee). CONS: Except for the 四部備要 and 四 部叢刊, most databases target researchers with specific interests such as those studying stone inscriptions and history of Chinese medicine.
Databases of pre-modern Chinese texts: an overview (cont’d) Wenyuan Ge Siku quanshu ( 文淵閣庫全書 ), digitized by Unihan Company and marketed by Dizhi in Hong Kong. PROS: A comprehensive collection of pre-modern works in the “four classes.” CONS: Same as Siku quanshu, a “purified” and “politically correct” library of the Qing Dynasty.
Erudition databases: background Liu Junwen 劉俊文, CEO and Chief Editor of Erudition. Well-known scholar in Tang history. PKU history professor. Authority in the study of medieval Chinese political and legal systems. Compiler of 《日本學者論中國古代史》. Investment from a private source. Most other Chinese database companies received heavy financial support from the government, but Erudition “did not get a penny” so far.
Erudition databases: content Main product: 中國基本古籍庫, containing 10,000 titles. All digitized from print editions “carefully selected” by experienced scholars and believed by many experts to be the best available. Very high accuracy. Because of its high quality, the database is cited by many recent studies on pre-modern China as a primary source (just like the 中華書局 edition of the 24 dynastic histories, and the 中國古 典文學叢書 from 上海古籍出版社 ). Other Erudition databases: 俗文學, 方志, 申報, 敦煌文 書, etc.
Erudition databases: technology Based on Unicode. Over the past decade, the number of Unicode unified CJK ideographs has grown to more than 70,000 (by comparison, the 漢 語大詞典 has more than 50,000 characters). OCR (followed by manual proofreading) was used in the production of the 中國基本古籍庫. But it was abandoned by Erudition soon after that. All new databases are made by human inputting (using stroke-based—not romanization-based—input methods).
Erudition databases: marketing The Erudition database (esp. the 中国基本古籍库 ) sells pretty well in China, Hong Kong, Taiwan, and Japan. In the West, Berlin State Library and Library of Congress are their customers. Harvard, Columbia, Princeton, Yale, Chicago, Michigan, and Northwestern have recently formed a consortium to purchase the 方志 and 俗文學 databases from Erudition. Erudition databases are the most expensive among Chinese databases. Consortium purchase is the only solution for North American institutions.