Village DB: About
Welcome to the Roots Village Database, a digitization of the information from the Index of Clan Names By Villages published by the American Consulate General in Hong Kong in the 1970s. Originally used to investigate immigration fraud, this data is now valuable for genealogy research.
Currently, data entry for Toisan is complete. Hoiping data entry has just started. Please be patient; this is an entirely volunteer-driven project. Eventually we will have data from four counties, and all sorts of fancy indexes. :-)
The data here comes from the Index of Clan Names By Villages. There are four books, one each for Toishan, Sunwui, Hoiping, and Chungshan. Eventually, we'll put up the introduction from those books, but for now this note should suffice:Note for the Reprint Edition
Thanks to Him Mark Lai for wanting this to happen in the first place; to Beatrice Yu, for setting up the groundwork; and to Tony Tong, for coordinating the project in its current form.
Q: What does "Map Location" in the Heungs mean? Are there gonna be maps?
A: According to the introduction of the Index, the map locations are keyed to the grid coordinates of the U.S. Army Map Service Series covering Kwangtung Province. We don't have these maps handy, but eventually we'd like to get our hands on some sort of map. Meanwhile, you can try the following pages:Siyi Genealogy
Maps of Taishan County
Q: Friggin it's been two years since the last update. What the hey?
A: This database project is entirely volunteer-driven, so progress isn't as regular as one might hope. Also, all the programming is done by one person, me, so my hit-by-a-bus factor is rather high. As it turns out, in June 2002 I began suffering from RSI, or repetitive strain injury, which I'm still dealing with now, though fortunately I can type again. Also, during the last school year I was out of the country (in Taiwan), which also made it difficult to work on the database. But don't worry, I'm back now.
Moral of the story: DON'T IGNORE WRIST PAIN! Rest often. Improve your posture. Drink water. Stretch, exercise, get moving. Believe me, it sucks when it hurts just to write in your journal, or use chopsticks.read more about RSI
Typing Injury FAQ
For those of you who are curious, the database is running on MySQL as a backend, and perl cgi scripts for the interface. MySQL is fast, free (open source), and can handle lots of data. Perl is just cool.MySQL
The database is being developed on Mac OS X, with the help of BBEdit, CocoaMySQL, and Safari.Mac OS X
Really Technical Notes
The Chinese characters on these pages are encoded in Big5-HKSCS, which is Big5 (the standard encoding for traditional Chinese) plus a bunch of Cantonese-specific characters that the Hong Kong government added on. (Big5 itself encodes 13,060 characters or so.) Input of Chinese is done through STC, or Standard Telegraph Code, which maps a 4-digit code to a character. Apparently, there are two different telegraph encodings, one for Taiwan and one for mainland China. The version used in our data is of the mainland variety, and apparently can be found in a book entitled 《電報明碼》. Naturally, this book is nowhere to be found (I haven't had the chance to beam over to Hong Kong and search the large bookstores there [update: I have, and it's still nowhere to be found, though I didn't have time to search the big libraries there]), and the various tables out on the internet are rife with mistakes. The telegraph data that this database uses is culled mainly from information put together by the Unicode people. Thank you, Unicode. This data, combined with a couple of other sources, gives us a telegraph code table of 7977 characters, which still appears to be missing a few. If anyone knows where I might find a more complete table, or the book, please let me know (email at bottom of page).Unicode Data - look for Unihan.txt
In addition, the database uses pinyin (Mandarin) and jyutping (Cantonese) romanizations, also looked up from tables. The pinyin table has information for 13,024 characters. The jyutping table, provided by the LSHK (Linguistic Society of Hong Kong), contains 10,675 characters. For characters where more than two pronunciations are possible, I've tried to choose the most likely one. Let me know if you run into problems.Chih-Hao Tsai's Technology Page - the source for the pinyin table
last modified 2005 June 27 by Dominic Yu | contact: firstname.lastname@example.org