4 Some use cases for the Kanseki Repository

Electronic texts for researchers from researchers.

Overview

In this section four examples of how to make use of the data in the Kanseki Repository will be presented:

  • (I) Correcting and reporting problems in the data
  • (II) Translating
  • (III) Collaborative work
  • (IV) Analytic processing

These processes are increasingly complex, but each of these sections stands alone and can be read and understood separately.

(I) Correcting and reporting problems in the data

The editors of the Kanseki Repository rely on contributions from the user community to improve the accuracy of the texts. For reasons of efficiency and transparency we recommend that the reporting mechanism built into GitHub and git is used for this purpose. Here we present an example of how this can be done. The procedure is still somewhat tedious and overly complicated, since it follows the general protocol for reporting problems in the source code of the software. Hopefully, the procedure can be streamlined to reduce the steps needed by the user, but the more important steps will still be necessary. Here is an outline:

  • (1) Spot an error or problem in the text. Make sure that the transcribed version and the digital facsimile where the problem has been noted represent the same edition of the text1.
  • (2) Fork the text in question to your user account.
  • (3) Open the text for editing in GitHub to make the necessary changes.
  • (4) Commit the changes. These should now become visible if you look at the page while signed into your account on kanripo.org.
  • (5) If everything looks fine you can now report the problem to @kanripo by creating a pull request.
  • (6) The editors will respond to your request and if there is no problem merge it into the branch on @kanripo.
  • (7) The corrected text will now be available to all users.

The following example illustrates these steps in more detail.

(1) The problem

In this case, while looking at a text discussing the 太極圖 Taiji diagram in Figure 1, we notice a problem: The characters for the five elements 五行 wu xing have not been entered, as seen by the fact that the text contains circled characters.

krp-pr-problem.png

Figure 1: The problem we want to correct

(2) Forking the text

In Figure 1, the user krptest is currently logged in to kanripo.org. Clicking on the link labeled GitHub takes the user to the corresponding page on the GitHub account of @kanripo, as seen in Figure 2.

krp-pr-kanripo.png

Figure 2: The GitHub page for text KR3a0023 太極圖說述解-明-曹端

Clicking on the fork button will initiate the forking process and after a short time the text should appear in the user's account, as shown in Figure 3. When the user now views this text on the kanripo.org website, the copy in @krptest's account will be used in preference to the one in @kanripo's account2.

krp-pr-krptest.png

Figure 3: The text is now forked in @krptest's account

(3) Open the text on GitHub for editing

With the forked version of the text ready, a click on the GitHub link in Figure 1 will now open the text directly in @krptest's account for editing. This is always the fastest and safest method to open the file for editing, since it will also be automatically located on the correct branch, WYG in this case.

krp-pr-editing.png

Figure 4: Text KR3a0023 in editing mode

Figure 4 shows the text in editing mode on GitHub. The location we want to change in line 145 is highlighted. Since the page on kanripo.org that showed the transcribed text and the digital facsimile side by side is no longer visible, it is a good idea to open the GitHub link in a new browser window. This will make it possible to view both at the same time, as shown in Figure 5. This figure also shows the character viewer used to input the desired characters. This will of course not always be necessary or even available, since it is a feature of the underlying operating system.

(4) Commit the changes

Figure 6 shows the situation where the characters have been added and the work is ready to be committed. It is good practice to include a commit message that explains what has been done. Figure 7 below shows the page reloaded for the user @krptest. The five elements are now readable.

(5) Make a pull request

If everything looks good, a pull request can be made. This is done from @krptest's GitHub page for the text. A pull request can be initiated by pressing the green “Create pull request” button shown in Figure 3. Care is needed to ensure that the correct branch is designated, as shown in Figure 8. Here, the "base fork" (the object of the pull request) is set to branch "WYG" of "kanripo/KR3a0023", and the "head fork" (the origin of the pull request) has been set to the same branch of "krptest/KR3a0023. If the branches are not set correctly, the pull request will be of no use. To make it easy to confirm that everything is correct, a comparison of the two versions is shown at the bottom of the screen. A system message "Able to merge" indicates that the branches can be automatically merged. This is a necessary condition for a successful pull request. Since everything is fine, @krptest now just needs to click on the green button "Create pull request" to actually issue the request. As the text next to the button suggests, this will initiate a discussion about the changes with the editors and possibly other users.

(6+7) response from the editors and merge

Figure 9 shows how the pull request will look to users that have editing permission on @kanripo. Any discussion concerning this change will take place here, until the editors finally decide to press the button "Merge pull request". From this point on, the change will be available to all users of @kanripo. However, users who forked or cloned their version before this change will have to update their copies to take advantage of the change3.

Conclusion

The steps described here show how changes can be made to the text in the Kanseki Repository. The record of who initiated this change, when and why will continue to be available and can be verified by all users.

At the moment, the series of steps to initiate the change is still somewhat cumbersome. Future updates to the website will likely make this process easier, but the general sequence of forking, editing, committing, requesting a pull, discussing the change with the editors and the final merging by the editors will not change.

If this all sounds too complicated, or if the problem in the text not so easily resolved, it is always possible to open an "issue" on the text. This is done by visiting the text in the @kanripo account on GitHub. When you view a particular text, the "GitHub" link on that page will lead to this location. (If you have already forked the text, however, it will lead to your own copy, but from there it is easy to go to the “origin” via the link on the page of the forked text.) A small button with a circled "?" and the word "Issues" will bring you to the “Issues” page. Here, a new issue can be opened by clicking the green "New Issue" button. Give it a short title and describe the problem as clearly as possible, including the edition, juan number and page of the problem.

krp-pr-two-windows.png

Figure 5: Two browser windows and the character viewer

krp-pr-ready-to-commit.png

Figure 6: Changes completed, ready to commit

krp-pr-changed.png

Figure 7: Confirm the changes in kanripo.org (visible only to @krptest)

krp-pr-create-pr.png

Figure 8: Confirm changes and create pull request

krp-pr-pr-ready.png

Figure 9: Pull request ready

(II) Text and translation

A frequent task that requires close reading is the preparation of a translation of a text. This section outlines the process for preparing a branch for translation and reformatting the text, and describes ways to facilitate this work.

The procedure can be broken down into several steps.

  • creating a new branch to contain the translation
  • reformatting the text, adding punctuation where necessary
  • adding the translation to the text
  • pushing the translation to the GitHub account (optional)

Creating a new branch

krp-trans-prep.png

Figure 10: Ready to create a new branch

The translation will be located on a new branch, together with the text. Since a translation requires long and continued effort, web browsers are not well suited to this task, which is why Mandoku is used. It is assumed that the program has been installed and set up as described in Chapter 3. The text of interest can be looked up, for example using the "Title search" function, called by pressing F7. As explained, the text can be displayed (enter the text name on the line) and downloaded (cloned) using the command "C-c d". In this case, a fork is created and the remote is added.

In most cases, it is convenient to start the translation from a master branch. As shown in Figure 10, with the text displayed, "C-x g" has been pressed and in the Magit overview screen at the bottom of the window, "y" has been pressed to display the branch manager. This branch manager is divided into three parts. The topmost one is labelled "Branches"; this displays the local branches. The next part, labelled "Krptest", displays the branches connected to the forked text of @krptest. The third part shows the branches in the original location of the text, @kanripo. The cursor selection is set to the local master branch. We want to create a new branch based on this one, so we press "b" (for branching) followed by "c" for "creating and checkout". (As mentioned, a help screen with available commands can be displayed by pressing "?".) On the line at the very bottom of the window, you are now prompted for the name of the branch to start from. "master" is suggested because that is the branch where the cursor is positioned. Pressing enter will confirm this and generate the next prompt, asking for the name of the new branch. We choose a different name, "trans-en" to indicate that this is going to be the branch for a translation into English. The name is of course entirely up to the user, although it is advisable to avoid using capital letters, as recommended. After again pressing enter, the action is executed, and a new branch appears under local branches. "@" precedes the name to indicate that the branch is active. The header line of the text displayed at the top of the window changes to show that the active branch is "trans-en". We can now close the branch manager and the Magit status window by pressing the key "q" twice. This will restore the window displaying the text at full size.

krp-trans-show.png

Figure 11: Mandoku menu Display->Show markers

krp-trans-markers.png

Figure 12: Text with visible markers

krp-trans-start.png

Figure 13: Text reformated for translation

krp-trans-done.png

Figure 14: Translation done

krp-trans-kanripo.png

Figure 15: Translation on the Kanripo.org website

Reformatting the text

Before actually starting the translation, we will reformat the text a little to make it possible to write the translation of the text on the same line as the text itself. This is useful for a raw translation, since it makes it easy to look for the translations of terms that have already been encountered. We use a "one phrase, one line"一逗一行 format here. While editing in the file, it is a good idea to show the hidden markers that indicate page and line breaks. This can be done from the menu using "Mandoku > Display > Show markers" (Figure 11). As shown in Figure 12, a "¶" character appears at the end of every line. In the Mandoku text format, this indicates the end of a line in the base edition. Furthermore, page numbers are now displayed in full, to show not only the current page, but also the text number (KR6q0332) and the edition (X in this case, which stands for the 新纂大日本續藏經 Shinsan dainippon zokuzōkyō). This information is important for the correct functioning of the system, so it should not be deleted. However, the file can be rearranged as needed without causing problems, as long as the sequence of characters is not altered.

Starting the translation

Figure 13 shows the file with lines reformatted, ready for starting the translation. This can of course also be done line by line if the translator prefers. The markers can also be hidden again (using same menu command as above) to reduce distraction while working with the text. Figure 14 shows the result of translating the first few lines. Translation and text have been separated by a "tab" character. (To enter a “tab” use "C-q C-i". This complicated key combination is necessary because the tab key is already assigned for other purposes.) The tab-stop has been set to 30 (M-x set-variable tab-width <enter>30).

Pushing to the GitHub account.

krp-trans-magit.png

Figure 16: Magit display: Unstaged changes, no remote to push branch trans-en for @krptest

If the translation is pushed to the GitHub account, it will also be available to @krptest when using the kanripo.org, and it can be used on other computers. It can also be used as a backup in case of emergency.

To initiate the push, we again press C-x g to call up=Magit=. Now the display looks like Figure 16. There are two things to note: (1) The push destination is absent (because the branch was created locally, not pulled from a remote) and (2) there are “unstaged changes”, which means there are changes to the file that have not yet been committed to the internal registry of changes. Before pushing, we therefore need to stage and commit the changes. The easiest way to do this is to press "c" two times. This will call up the buffer for editing commit messages, as shown in Figure 17. The cursor is at the bottom of the window, where the commit message is entered. All the other parts are for information only. The top part of the window shows the changes to be committed, while the bottom part below the cursor shows the files affected. After writing a short message to explain what has been done, pressing "C-c" twice will conclude the action. The part of the window about "Unstaged changes" now disappears.

krp-trans-commit.png

Figure 17: Commit message editor and display of changes

krp-trans-magit-popup.png

Figure 18: Magit popup menu: "p" will push to krptest/trans-en

We can now initiate the actual push. Pressing "P" (i.e., capital letter “P” or shift-"p") brings up another popup buffer as shown in Figure 18. This buffer indicates what keys are available to conclude the action. Here the second section, labelled "Push trans-en to" is most relevant. The first item in this section is the one we need. This will push the commit we just made to the user account on GitHub. After pressing p to initiate this, the program might prompt for the GitHub user name and password if these have not been saved before, but usually it will do its magic and then display the commit message just entered for both the local and the remote branch.

The change is now available on GitHub and it is also visible on kanripo.org. However, we first need to tell the website that we now want to see the translation branch and not the master branch of this text. This is done by adding the line "KR6q0332=krptest/KR6q0332/trans-en" (the "user account/text number/branch") to one of the configuration files, as shown in Figure 26. Once this is set and loaded, the result should be visible for the user @krptest as in Figure 15. It should also be noted here, that the change on the GitHub repository will be visible to everybody who visits the page there, not just the owner. Therefore, if you do not yet want to share unfinished work, it's better to use a private repository, as explained in the next part, or simply avoid pushing to the repository on GitHub.

(III) Workflow for accessing and sharing texts

Sometimes the reading and translating of a text is done by multiple members of a research group. In this case, a separate account should be set up for the research group, designated as an "organization" account, not a user account. The members of the group can then be added to the organization. This enables the group to work with a common, shared copy of the text in addition to, or instead of, maintaining a copy in every individual members' account.

Here, we will assume that the account of the research group is called "krp-zinbun" and has the permissions necessary to create private repositories. Private repositories can not be directly forked from public repositories; they have to go through some local clone. An easy way to create such a local clone is to use Mandoku. Simply display the text as explained above and then press "C-c d" to download the text. We do not want a fork in this case, so simply answer “no” when prompted to. This clone can then be pushed to a repository created as private on GitHub. A group administrator will have to set up the repository. (The details of this are beyond the scope of this introduction.)

We will also assume that the master branch of this text in the account @krp-zinbun will have the text all users see on kanripo.org, while individual users will have their own branches on which they do their work. Only when the work is ready, will this branch be merged into the common master. We will further assume that users will each be responsible for certain sections of the text, which they will prepare in private, and present the results of their work to the group later.

The steps that need to be completed for this task depend on whether the work is done on Mandoku or on the =Kanripo.org=website:

  • For work on Mandoku:
    • Clone (=download) the text from @krp-zinbun
    • Create a branch
    • Prepare the text
    • Add translation and notes
    • Merge and push
  • For work on Kanripo.org
    • Fork text to the user account
    • Create a branch
    • Prepare the text, add translation (this can be done outside the browser)
    • Merge and push
  • Confirm visibility to all (both methods)

Mandoku

The workflow on Mandoku is very similar to the one already described for translations prepared individually. The description offered now focuses on the differences.

Preparation

With the setup above, user @krptest is a member of the research group and is starting to work on a part of the text assigned to her. She will first need to get the files from the private repository, where contributions from others might already be recorded. First, she will look for the text needed in the catalog and display the file that contains the line "Don't edit this file. If you want to edit, press C-c d to download it first." However, instead of using "C-c d" to get the file from the @kanripo account, this time she uses the command "Mandoku > Maintenance > Download this text from other account" available on the menu4. Selecting this function now prompts for the name of the account to use, to which she enters "krp-zinbun". This will of course work only if she indeed has access to this repository. If everything goes well, the text will be cloned and will be available for editing, as shown in Figure 19.

krp-collab-ready.png

Figure 19: Magit display: On local branch master, ready to create a new branch starting here

Starting work

Before she actually starts editing the text, @krptest will create a branch where she can work without being affected by others. This branch is named "work", but any name will do. As shown in Figure 19, the user now creates a branch using the procedure explained above.

As soon as the line "@ work" appears under "Branches", the branch "work" is active, setup is finished and the actual work can start. While working, a digital facsimile of this page can be displayed at any time by pressing C-c i, as shown in Figure 24.

When the work is finished, it can be committed as usual (C-x g, then c and c). The work is still only on the local computer and not available to anybody else.

Merging and pushing

When @krptest is satisfied with her work, she will want to make it visible to the other members of the group. This is done by pushing it back to GitHub. Since she is on a different branch, however, she will first have to merge her work back into the master. Before doing so, she will check if anybody else has pushed to master in the meantime. To make sure the master is up-to-date, she will go to the branch manager (C-x g and then "y"), and change to the master branch. (This is done by pressing enter on the line with "master". "@" will then indicate the active branch. This requires that all changes have been committed.)

@krptest will now press "F" (shift-"f") to pull changes from GitHub into master. On the popup screen in (Figure 21) this is done by pressing "u". The master branch is now up-to-date and @krptest can merge her changes in and push. Pressing "m" to merge will display a screen with additional options, as in Figure 22. To see what will be merged, "p" can be pressed to generate a preview. This confirms the branches from which the merge ("work") will take place and also shows what is going to be merged. The preview is displayed at the lower part of the screen, as in Figure 23. The lines of text in red are the lines that will be deleted; the green ones (only partially visible) are the lines that will replace them. Pressing "q" closes the preview, and if everything seems fine, "m" can be pressed to do execute the merge.

Figure 25 shows the new situation. After master the description now reads "[origin/master: ahead 1]". This means that we are now one commit ahead of the master branch from the remote "origin". The bottom of the window shows that the origin/master is still at the previous commit. Pressing "P" (push) followed by "u" (destination "origin/master") executes the push action. After this action, the master branch of KR5e0001 in the group's account @krp-zinbun will the same content as the version on @krptest's computer.

Kanripo.org

krp-zinbun-fork-branch.png

Figure 20: Create a branch from krp-zinbun/KR5e0001

Forking the text / Creating a new branch

Depending on the policy of the team, a branch can either be created on a separate fork, or directly in the group's repository. In this case, as can be seen in Figure 20, a branch is created without prior forking.

Preparing the text and translating

The text can then be directly edited in the browser on the GitHub site, or the text can be copied to a different place and then pasted back in later.

Merging and pushing

The merge will again be initiated through a "Pull request" in the same way as above in "(I), (5) Make a pull request". In this case, however, a member of the group will need to have the necessary permissions to complete the pull request by merging into master.

Visible to all members

Regardless of how the change is initiated, it should now be visible on the website kanripo.org for all users who are members of the group @krp-zinbun, provided that they have requested this in their settings, as shown in Figure 26. (The settings are accessed most conveniently from the link to the file global.cfg on the profile page.) This means that all members, even those who are not using Mandoku are able to see the results.

Conclusion

The combination of Mandoku and the Kanseki Repository provide new tools for collaboration. While these tools might initially seem complicated and cumbersome, they are easier to use than the explanation might suggest. In time, the tools will also improve. It should become possible for example to directly prepare printouts for research meetings without the need to use a word processor.

krp-collab-pull.png

Figure 21: Magit display: press "u" to pull into master

krp-pull-merge-confirm.png

Figure 22: Magit display: press "p" to preview the merge process

krp-collab-merge-preview.png

Figure 23: Magit display: changes that will be merged to the local branch master

krp-collab-complete-w-fac.png

Figure 24: The text after completing the merge, displaying a digital facsimile to the right

krp-collab-merge-complete.png

Figure 25: After merge: 1 commit ahead of origin

krp-collab-kanripo.png

Figure 26: Display in kanripo.org for members of @krp-zinbun

(IV) Analytical processing

In this last section, some advanced ways of working with the Kanseki Repository are presented. The example research challenge considered here is: "For a given list of terms, what are the texts most relevant to these terms?"

To answer this question, every single one of these terms will have to be searched in the database. For each term, the texts that contain the term are noted and the number of matches per text is calculated. Obviously, this is a task that cries out for automation, but the question is how can it be done?

With Mandoku it is actually quite straightforward to do this. Org-mode, on which Mandoku is based, enables executable code to be embedded within documents, in the manner of so-called 'literate programming'. Such documents can also be exported easily to a word processor format, or even to PDF. This makes it an ideal tool for reproducible research, which is a hot topic in some fields, because it allows the description of the research, the data that serves as the basis of research, and the programs used to analyze the data to be bundled together in one single package.

To demonstrate this, a document has been prepared that can be used to produce reports on data.

Downloading the document and preparing the data

A document that can be used to generate a report on the texts that contain the listed terms is available at mdplus. Download and save the document (to your Documents folder, for example), then open it in Emacs.

The input to this document has to be in a folder called "input", located at the same level as the file. In this case, it would be Documents/input. For this example, a text file containing the terms in the following list, named bing.txt, will be used.

Executing the analysis

krp-babel-srcblock.png

Figure 27: The code that will be executed with "C-c C-c"

Figure 27 shows a so-called "source block", which contains the code to be executed. To execute the code and start the analysis, move the cursor to the line starting with #+BEGIN_SRC, then press "C-c C-c" (that is, press “control-C” two times in succession). Emacs will ask you to confirm your intention of running the code with the prompt "Evaluate this emacs-lisp source block on your system? (yes or no)" in the minibuffer at the bottom of the screen. Type "yes" then press enter. This also needs to be done twice.

Emacs now performs searches in the Kanseki Repository for each of the terms in the files located in the “input” folder. The results of these search are saved for later analysis. Note that for long lists, searching can take considerable time. Upon completion of this phase, another part of the program will look at the results and compile some reports.

Results

The intermediate results of the search are placed in a folder called data. A data folder is created for every input file and each one contains two further folders, raw and index. The raw folder records the results of the search as they are returned from the Kanseki Repository index server. These results are used to generate the report. The second folder, index contains the formatted index as it would be displayed in Mandoku. The reports can link to these files to allow further inspection of the details of the results.

Figure 28 shows the first few lines of the generated report, with the texts listed in order of the highest number of matching terms. The blue text code is a hyperlink to a more detailed display, as shown at the top of Figure 30. The terms are again hyperlinks; the lower part shows the details of search results for the term 鬼病 gui bing. This is the same information displayed when a live search is conducted for this term and the links can similarly be used to jump to the location in the text. (In fact, it is possible to change settings so that the link performs a live search instead of calling up the results saved in the file.) Since this is simply a file in the same format as the texts of the Kanseki Repository, it can be edited, saved, and copied, for example to eliminate matches that are not relevant to the question at hand, or to annotate it with comments from the researcher.

krp-babel-results-by-texts.png

Figure 28: Start of the report

krp-babel-results-by-section.png

Figure 29: Sections with the most matches

krp-babel-results-with-index.png

Figure 30: Detail for text KR6k0206 with index display for 鬼病

Conclusion

The report is generated by the function mdplus-print-results contained in the file. This particular function is also written in Emacs Lisp, but it could be written in any of various other languages (such as Python, Ruby, R, and Perl)5 and executed in the same way from within Emacs. If a different analysis is required, it is usually sufficient to replace this function by another one that yields the desired results.

Having the source code for the analysis bundled in this way with analysis instructions, results, and collected and analyzed data makes it easier to share the source code and research findings with other researchers and enables other researcher to reproduce the results easily.

The functionality demonstrated in this last section depends mostly on org-mode, which is a part of Emacs and also the basis for the extension Mandoku. Org-mode has many more features that are relevant to researchers. For example, it allows research publications to be written in org-mode with the ability to export to Libre Office, HTML and PDF formats. This functionality was used to generate this report.

Footnotes:

1
The master edition is of course an exception, since it does not represent one single edition. However, since that is an interpretative edition, any discrepancy might have been intended by the editors, so the request for a change has to be argued differently.
2
This is a default setting, but the setting can be changed if necessary, as demonstrated later (see Figure 26).
3
For a fork, this involves "rebasing" the fork; for a clone, it is simply a pull from the remote, where it was originally cloned from.
4
This command will be available only if the text has not yet been downloaded. If the text has been previously downloaded from @kanripo, the text at @krp-zinbun can be simply added as an additional remote.
5
More information about this feature can be found in the Org Manual, Chapter 14 orgman.

Bibliography