This section will give an overview of the contents of the
Repository. It will first give some background about earlier phases
of the development and then proceed to discuss the layout of the
One of the most frequently raised questions about the repository concerns the source of the texts. The short answer is that about half of the texts available here have been collected from the Internet over a period of more than 20 years, while the other half is from projects I have been involved with over the past 25 years. All texts have been extensively curated, i.e., proofread, edited, linked to a digital facsimile in many cases, and converted to the format required by the repository. The digital facsimiles, where available, have been separately acquired and subsequently linked to the text. This process is still ongoing and the results have not yet been thoroughly proofread.
The representation of characters that cannot yet be represented in Unicode has been standardized and the process of finding substitute characters for a normalized version is still ongoing1.
As can be seen from this, the curating of the texts in the Repository is an ongoing, open-ended process. In fact, the texts are being moved to a public repository in their current state not so much because the curation process has reached a major milestone, but rather to enable wider participation in the process. Editing even a single text is a major undertaking that requires much time and effort. Doing this on the scale of ten thousand texts is impossible for any one individual or even a single institution. Hopefully, releasing the repository even in its current state will encourage more contributions and in time lead to the inclusion of all important texts with the relevant editions in a well curated form. Even if that day is years or, more likely, decades away, the work done here will hopefully serve as a foundation and building block for achieving that worthy goal.
Phases of development
(1) 1989 - 1996: Early history – mostly individual texts
Kanseki Repository as it stands today started out as a personal collection of digital text files, created and maintained as a by-product of my research in Sinology and more specifically the Chinese sources of Chan/Zen Buddhism, through the collection of resources for facilitating my work.
The first text to enter the archive was the subject of my MA thesis, the 300+ poems attributed to 王梵志 Wang Fanzhi (currently text No. KR6s00552), which I input by myself in the autumn of 1989. This was a relatively short text that could be entered in just a few weeks (including the time to create characters not available to the system). The next text, the collection 全唐詩 Quan Tang shi (KR4h0140), was more substantial, however, consisting of 900 juan. It was handed to me by a colleague and friend I came to know through a "Chinese Computing" workshop in Bremen in 1990. The stage was then set for the arrival of more texts, such as the 25 histories (many of which can be found in KR2a 正史 類 Zhengshi lei section). At this point, the repository had already exceeded the capacity of my computer’s hard disk, forcing me to rely on an external drive. By this time I was doing research for my doctoral dissertation in Kyoto. I was attached to the International Research Institute for Zen Buddhism (IRIZ), where I worked on the Zen Knowledgebase project lead by Urs App. We produced a number of digital texts for this project that were subsequently released on CD-ROM and on the Internet in June 1995. These texts are now in section KR6q 禪宗類 Chanzong lei of the repository.
(2) 1996 - 2001: Buddhist canonical texts
After the submission of my PhD thesis, I was given a CD-ROM of the Korean Tripiṭaka3 and spent a full summer analyzing its content and converting it to a readable format. The
WWW Database of Buddhist Texts wwwdb. was one of the fruits of this work. This database later proved useful for the work of the
Chinese Buddhist Electronic Text Association (CBETA), to which I contributed from February 1998 (until 2001 as resident and later as a consultant). Nearly 50% of the content of the
Kanseki Repository originates from the texts produced by the CBETA team. The collaboration is ongoing and additions or changes in the CBETA texts are systematically carried over to the
(3) 2001 - 2012
With my move in 2001 to Kyoto University and the
Information Center for Chinese Studies (DICCS) (since 2009 the
Center for Informatics and East Asian Studies (CIEAS)) of the
Institute for Research in Humanities, the range of texts for
investigation and curation was broadened further. Several research
projects were instrumental here:
Toward an Overall Inheritance and Development of Kanji Culture (2003-2008)
As part of this project, which under the leadership of 高田時雄
Takata Tokio was selected as a
Center of Excellence (COE)
project by the Japan Society for the Promotion of Science
(JSPS), enabling a particular focus on historical texts that dealt with events during the Tang
dynasty. This resulted in extensive annotation of
sections of the 資治通鑑 Zizhi tongjian (KR2b0007), but work has
also been done on the two official histories of the Tang, among
Ceremony and Punishment in East Asia (2006-2010)
This project, which was also sponsored by the JSPS (project leader 冨谷至 Tomiya Itaru, project #18102003) resulted in a few texts related to institutional history, especially 唐六典 Tang liudian (KR2l0001). and facilitated the further development of methods and procedures.
道藏輯要 Daozang jiyao project (since 2005)
This is a major ongoing project aimed at analyzing the early Qing compilation of the Daoist texts 道藏輯要 Daozang jiyao (DZJY). It was started and initially lead by the late Monica Esposito (1962-2011), with active contributions from Mugitani Kunio 麦谷邦 夫 and others; it was sponsored by the JSPS (project #20242001, 2008-2011) and two grants from the
Chiang Ching-kuo Foundation for Scholarly Exchange (projects #RG003-P-05 (2006-2009) and #RG006-P-09 (2010-2013)). As part of this project, the entire text of the DZJY was input and proofread. In addition, a digital version of the text of the 正統道藏 Zhengtong Daozang (ZTDZ) was acquired, and a set of high-quality digital images was integrated to bring the collection to its present form. We also received another digital version, curated and proofread from the 道教學術資訊網站 Daojiao xueshu zixun wang zhan (ctcwri), but this is not yet thoroughly incorporated into the repository.
Chan database project (since 2008)
This project was conceived as a collaboration between the late John McRae (1947-2011), Christoph Anderl, and Christian Wittern. We have not yet received any substantial funding, but we exchanged data (within the scope of the project) and prepared an enhanced version of the 祖堂集 Zutang ji (KR6q0002), as well as of some editions of the Chan texts 景德傳燈錄 Jingde chuanddeng lu (KR6q0003) and 五燈會元 Wudeng huiyuan (KR6q0012). A part of this work was made possible with funds from CIEAS.
(4) 2013 - 2016 Fundamental topics in Digital Humanities
Several times during the above mentioned phases, efforts were made to create an overarching framework and common format to better enable comprehensive research on these texts. After 2010, the germ of an idea had taken shape and I started running some tests on how such a framework could be realized. The essential need is a systematic arrangement of the texts in a common, but very simple, format, based on the classical four sections of literature, but enhanced with additional categories for Daoist (KR5) and Buddhist (KR6) texts, as well as an analytical catalog and tools for efficiently interacting with the text corpus.
In April 2013, a research team was formed and began to meet regularly dhbasic to discuss methodological and practical issues related to this work, to ensure that the results would have the broadest possible application. All this has lead to the current collection of around 10,000 texts. In addition to the texts and collections already mentioned, these are drawn mainly from the following collections:
- 四庫全書 Siku quanshu
- 四部叢刊 Sibu congkan
- 維基文庫 Wikisource.zh
四庫全書 Siku quanshu (SKQS)
The Siku Quanshu with more than 3400 texts is the greatest single collection in the Kanseki Repository after the CBETA collection (which has roughly 1,100 texts more). Unlike the CBETA collection, however, the SKQS has not been acquired as a single entity, but rather pieced together over time. It may be useful to recount how this was done and describe the current state of the collection, which is still far from complete.
In the early years of the 21st century, more and more pre-modern Chinese texts became available online. In early 2007 I discovered a potential treasure trove of works5, which inspired me to attempt a systematic collation of the texts that belong to the SKQS.
Some years later, in addition to the transcribed texts, digital facsimiles became available at the Internet Archive. These turned out to be from the 文瀾閣 Wenlan ge copy of the SKQS, located today at the Zhejiang Library (浙江圖書館) in Hangzhou. Unfortunately, this edition, which was originally held in the Wenlan Pavillon on Gushan Island of the West Lake in Hangzhou, had been scattered during the years of the Taiping wars and only later in the 19th century, through the efforts of the brothers 丁丙 Ding Bing (1829-1887) and 丁申 Ding Shen (1832-1899) could a part of this collection be restored and moved to the Zhejiang Library. In addition to the problematic state of the source itself, the way the digital facsimiles are cataloged and made available in the Internet Archive makes it very difficult to arrange them in order, align them to the transcribed text, or assess their completeness. It was fortunate, therefore, that in early 2015 the opportunity arose to purchase a complete set of digital facsimiles of the 文淵閣 Wenyuan ge edition. These were scanned from a reprint by the Taiwan branch of Commercial Press (商務印書館 Shangwu yinshuguan), the same source used as the reference for the curation of the transcribed versions of the texts. These facsimiles made it much easier to produce a version that aligned the textual representation and the digital facsimiles. At the time of this writing, this work has been completed for about 2900 texts, but since the easiest of the texts were prepared first, the remaining texts will require considerably more effort in editing and alignment.
四部叢刊 Sibu congkan (SBCK)
Since the texts in the SKQS are sometimes heavily edited and not always the best possible source, where possible texts have also been aligned to the SBCK. This is a much smaller collection of fewer than 400 texts, of which some 100 are not in the SKQS, thereby providing additional material. Again, digital facsimiles could be purchased, but the work to establish the text and align it to the facsimiles is time consuming and ongoing. Thus, only very few texts have so far been processed.
This is the Chinese version of the Wikisource website, which has a few hundred texts. While these texts do not indicate the edition used, they come in modern, well punctuated editions and can thus be used in addition to the sources mentioned, especially in the master branch.
At the time of this writing, almost 9,000 or 90% of the texts in the
catalog are released and freely available in the @kanripo account
of the GitHub website ghkanripo. Most of these
texts are not yet vetted according to the strict philological
methodology outlined above. They are placed here partly in the hope that they will nevertheless prove useful as they are, but more importantly because the software interfaces described in the next chapter will allow users to work independently on texts of interest, without interference from the editors of the
Content of the Kanseki Repository
The content in the Kanseki Repository is organized similarly to a traditional Chinese library, using the four divisions of literature first used to organize the imperial library, which later served as the principle for classification of other collections as well. Since this traditional classification scheme has an inbuilt bias against Daoist and Buddhist texts, which play an important role in the
Kanseki Repository, these two sections have been elevated to stand alone as top-level divisions, thus increasing the overall number of top-level divisions to six.
The bibliographic treatise 隋書 Sui shu KR2a0023, compiled under the leadership of 魏徵 Wei Zheng (581-643) , was also organized in this way at a time when the enormous production of Buddhist and Daoist texts pushed the limits of the traditional classification scheme. These two sections were later moved to the end of the 子部 Zi bu, but in most cases only a very small selection of Buddhist and Daoist texts were ever deemed worthy of admission to the libraries. The current selection thus revives this classification scheme of the early Tang, as shown in Table 1.
|KR1||經部 Jing bu||Confucian Classics (incl. music, dictionaries and elementary learning)|
|KR2||史部 Shi bu||Historiography and politics|
|KR3||子部 Zi bu||Masters, philosophers and treatises|
|KR4||集部 Ji bu||Anthologies (Poetry and Collected Writings)|
|KR5||道部 Dao bu||Daoist texts|
|KR6||佛部 Fo bu||Buddhist texts|
The appropriate place in this classification has to be determined for every text in the Kanseki Repository. In most cases, this follows the precedents set by previous text collections, with some important differences, as outlined below.
Another question concerns the exact delimitation of a text. Where are the boundaries between different texts within a collection located exactly? And there may be sets of pages that could be considered a single text or several texts. The problem can sometimes be seen in anthologies of texts, especially when an anthology is itself part of a larger anthology. The ZTDZ, for example, contains the work 修真十書 Xiuzhen shishu "Ten texts of practices for obtaining the way", an anthology of 10 texts of very different length and widely varying characteristics, written by different authors in different times. Some catalogs consider it to be one text, thus assigning it a single entry. Since research may require a focus on only a subset of these texts, for the purposes of the
Kanseki Repository these texts have been considered separately, with each assigned a different number (KR5a0264 to KR5a0275, making, in fact, 12 entries). This decision was based on the notion that it is always desirable for a catalog to be as fine-grained as possible. On the other hand, this also means that the Xiuzhen shishu does not exist as a single textual entity in the
The four divisions of Chinese literature
As explained, the first four top-level divisions of the Kanseki Repository correspond to the divisions that are commonly used in many libraries and collections, such as the 四庫全書 Siku quanshu (SKQS) and the 四部叢刊 Sibu congkan (SBCK). When it comes to further divisions below this top level, there is considerable more variation, though. While the SBCK does not have any further subdivisions, the SKQS has up to two more levels — the 類 lei and the 屬 shu. For example, the third second-level class is 禮類 Li lei, the 'Ritual Books,' which is further subdivided into the 周禮 之屬 Zhouli zhi shu, the 'Rites of Zhou,' and five more third-level classes. Other second-level entries, like the 易 類 Yi lei, 書類 Shu lei, and 詩類 Shi lei, do not have any further subdivision.
In order to standardize classification depth, the hierarchy of the
Kanseki Repository has been limited to just one subdivision. Thus, all books in the Rites section have been grouped at the same level. Initially, the sequential order of the subdivisions on the 屬 shu level is preserved, but as the repository expands and more texts are added in the future, this order will become distorted, because newly added texts will always have to be added to the end of the relevant class.
In most cases, the divisions on the 類 lei level are as seen in the SKQS. The only exception to this is the 別集類 Bieji lei, "Collections of individual authors". Since there is such a large number of texts in this category, the subdivisions, which here reflect the time of the text creation by dynastic period as they are found in the SKQS, have been moved up to the 類 lei level as shown in Tables 2 and 3.
As for the arrangement of the texts within a class, our basic guideline is that texts are grouped by topic and usually by date or assumed date of creation, in such a way that related topics can be found together. In cases where several topical groups exist within one class, as in the example of the 禮類 Li lei (ritual books) above, the groupings have been preserved so that related texts still occur together. There is no overarching chronological arrangement, since the original chronological arrangement was within the subdivision, and this order gets preserved.
Furthermore, root texts are usually placed before commentaries and treatises that deal with them. Since most non-Buddhist texts in the Chinese tradition have been transmitted intermingled with commentaries or groups of commentaries, for some important texts with many commentaries a separate modern version containing only the root text has been created. Examples include the 周易 Zhou yi (KR1a0001) 'Book of Changes' and the 尚書 Shang shu (KR1b0001) 'Book of Documents'. In these cases, the characters 正文 Zhengwen 'root text' have been added to the title.
This has been harmonized in the Kanseki Repository as shown in Table 3.
Amongst other changes that have been necessary, all texts of the sections of the 釋家類 Shijia lei and the 道家類 Daojia lei in the SKQS have been moved to the corresponding sections of the newly created 佛 部 Fo bu and 道部 Dao bu, as detailed in the tables 4 and 5.
The Daoist material is mainly compiled from the collections 正統道藏 Zhengtong daozang (ZTDZ) and 道藏輯要 Daozang jiyao (DZJY). Titles from the 中華道藏 Zhonghua daozang (ZHDZ) that are drawn from the Dunhuang collection have also been added, but these texts are not yet in the repository.
In arranging the material in this section, the seven-part division that can be seen in the ZTDZ of the early Ming has been preserved (corresponding to subsections KR5a to KR5g). This is followed by the Supplement to the Daoist Canon, which was added to the Daozang towards the end of the Ming dynasty. Wherever possible the texts in the DZJY have been positioned as additional editions with the corresponding texts of the ZTDZ. For the texts not in the ZTDZ, an additional subsection 清代道教文獻 Qingdai daojiao wenxian, "Daoist texts from the Qing dynasty" has been added. Additional Qing material not in the DZJY is placed here as well. This is followed by a subsection for Daoist texts excavated in Dunhuang, but no texts have yet been added to this subsection.
The Buddhist texts are mostly collected and converted from the fine collection prepared by 中華電子佛典協會 CBETA (see cbeta, cbetap5) and follow the general structure developed there. However, since CBETA does not include the parts of the 大正新脩大藏經 Taishō shinshū daizōkyō 'Taishō Tripitaka' that contain commentaries by Japanese authors or texts pertaining to Japanese Buddhist sects, these parts have been added anew into the overall classification.
List of text witnesses
The raison d'étre of Kanripo is to enable and facilitate any kind of research that can make use of the texts contained. The texts in this digital library can be accessed through different interfaces, including machine and software interfaces. Another purpose of Kanripo is to render the texts and the entire library programmable. For this reason, every repository needs to have a
master branch, which is essentially the "best" text, i.e., the text that would be published as the copy text of a critical textual edition. A program trying to access the library in Kanripo should normally work on the
master branch, unless there is a special interest in some other particular version or other versions in general.
|SK4||集部||Jibu||Anthologies and belles-lettres|
|4a||楚辭類||Chuci||Poetry of the South|
|4b||別集類||Bieji||Collections of individual authors|
|4b1||漢至五代||Han zhi Wudai||Han to Five Dynasties periods|
|4b2||北宋建隆至靖康||Bei Song||Northern Song period (Jianlong to Jingkang reigns)|
|4b3||南宋建炎至德佑||Nan Song||Southern Song period (Jianyan to Deyou reigns)|
|4b4||金至元||Jin zhi Yuan||Jin and Yuan empires|
|4b5||明洪武開至崇禎||Ming||Ming period (Hongwu to Chongzhen reigns)|
|4b6||清代||Qing dai||Qing period|
|4c||總集類||Zongji||Anthologies and collections of many authors|
|4d||詩文評類||Shi-wenping||Critique on poetry and prose|
|4e||詞曲類||Ciqu||Poetry and arias|
|4e3||詞話之屬||Cihua||Explanations to poems|
|4e4||詞譜韻之屬||Cipuyun||Rhymes and musical notes to poems|
|4e5||南北曲之屬||Nanbeiqu||Arias from the North and the South|
|KR4||集部||Jibu||Anthologies and belles-lettres|
|KR4a||楚辭類||Chuci||Poetry of the South|
|KR4b||別集類-漢至六朝||Bieji Han zhi Liuchao||Individual collections Han to Six Dynasties|
|KR4c||別集類-唐||Bieji Tang||Individual collections Tang|
|KR4d||別集類-宋||Bieji Song||Individual collections Song|
|KR4e||別集類-明||Bieji Ming||Individual collections Ming|
|KR4f||別集類-清||Bieji Qing||Individual collections Qing|
|KR4g||別集類-近代||Bieji Jindai||Individual collections after Qing|
|KR4i||詩文評類||Shi-wenping||Critique on poetry and prose|
|KR4j||詞曲類||Ciqu||Poetry and arias|
|SKQS ID||KR ID||Title|
|SKQS ID||KR ID||Title|
|KR5i||清代道教文獻||Qingdai daojiao wenxian|
|KR5j||敦煌道教文獻||Dunhuang daojiao wenxian|
|KR6t||續諸宗(日本)||Xu zhuzong (Riben)|
|KR6u||敦煌文獻類||Dunhuang wenxian lei|
|WYG||【四庫全書・文淵閣】||Wenyuan edition of the SKQS|
|SBCK||【四部叢刊】||1919 edition of the SBCK|
|ZTDZ||【正統道藏・三家本】||Reprint of the Zhentong Daozang|
|YP-C||【原版道藏輯要】||First edition of the Daozang jiyao|
|CK-KZ||【重刊道藏輯要】||New edition of the DZJY (Kaozheng reprint)|
|CBETA||【電子佛典集成】||CBETA collection of Buddhist texts|
|TKD||【高麗藏・東國影印版】||Tripitaka Koreana, Dongguk University reprint|
|T||【大】||Taishū Shinshū Daizōkyō|
|J||【嘉興】||Ming canon (Jiaxing edition)|
|DCS||【東禪寺】||Edition of the Dongchan Tempel in Fuzhou|
|L||【乾隆大藏經】||Qing Canon (Qianlong edition)|
|P||【永樂北藏】||Ming canon (Northern Yongle edition)|
|GOZAN||【日本五山版】||Gozan editions (Japan)|
|S||【宋藏遺珍】||Remains of the Song canon|
\printbibliography[title=References, nottype=misc] \printbibliography[title=Online References,type=misc]
Kanseki Repositoryis ongoing.
- [tk-desc] EBTI meeting 1996, Tripitaka Koreana, link.
- [wwwdb] WWW Database of Buddhist Texts, link.
- [tkb] Tang Knowledgebase Project, link.
- [ctcwri] 道教學術資訊網站 Daojiao xueshu zixun wang zhan, link.
- [dhbasic] 人文情報学の基礎研究 – Fundamental topics in Digital Humanities, link.
- [ia] Internet Archive, Wayback Machine, link.
- [ghkanripo] Kanseki Repository at GitHub, link.
- [cbeta] Chinese Buddhist Electronic Text Association 中華電子佛典協會, link.
- [cbetap5] CBETA P5 XML on GitHub link
- [kr-catalog] Kanseki Repository Catalog, link.