Thursday, December 29, 2005

Scaling back my research: Reality sets in

Research update, Thursday night. Here are the highlights:

1) I have once again changed the scope of my research: The old plan was to conduct a computer-assisted text analysis over the Deng (鄧小平) and Jiang (江澤民) eras, but for reasons which will be explained later, have opted to just do the Deng period, plus a little overlap on either side of his reign. I have also cut down the number of planned searches of specific variables and combinations of variables from more than 30 to 23.

2) I have completed a census of all NCNA English news items (stories, briefs, summaries, but not exchange rates or technical notes related to the wire service) for each month, from January 1977 to December 1993

3) I have created a basic spreadsheet model that includes searches for variables, as well as derivative variables, comparisons of variables, and other data.

Here are the details, for anyone who may be interested:

First, a key for the content variables I am working with:

V = Vietnam and related terms
K = Kampuchea and related terms
L = Laos and related terms

S = Soviet Union/Russia and related terms
U = United States and related terms

I = United Nations and related terms
A = ASEAN and related terms

Right now I have an Excel spreadsheet, with worksheet tabs for each year from January 1977 to December 1993, broken up by month. The A column contains the following:

V
% of NCNA total

V+K

V-K
% V items with K
% V items without K

Ratio V:K items

V+L

V-L
% of V items with L
% of V items without L

Ratio V:L items

Ratio V+K:V+L items

(list goes on to Row 150, including spaces)

Bolded items are searches I have to perform, the non-bolded items are derived results using Excel formulae (for instance, V-K can be obtained by subtracting V+K from V). When I refer to V, it means the number of news items that mention Vietnam anywhere in the text. V+L correlates to all items with Vietnam and Laos in the full text. In other words, an article that mentions both Vietnam and Laos in the text of the story will be counted. If there are four articles that meet this criteria in a given month, my Excel spreadsheet will record "4". If I refer to V+L-S, it means a search for all NCNA news items that mention both Vietnam and Laos, but not the Soviet Union. "% of V items with L" will display the percentage of NCNA items that mention Vietnam and also mention Laos.

Relating to point 1, here's the reason for the change of heart. In a nutshell, reality set in -- I realized that the number of searches required to get a baseline of NCNA items for the Jiang era would require at least 15 hours of additional work in LexisNexis. 15 hours is a big deal for someone who has a full-time job and a full-time family. It would mean pushing back the starting point for writing my thesis proposal by two weeks, at least. Just doing the Deng-era NCNA census totalled well over 1000 searches, or about 12 hours of work in front of the computer.

But here's the part that's most frustrating: It doesn't have to take so much time to do these searches. The problem lies in the tool I am using. The more I use Lexis Nexis, the more I am aware of the limitations of the interface and the results that are displayed -- when trying to gather monthly totals of NCNA English items, the error message that results when more than 1000 hits are returned causes lots of problems for me -- by the early 90s each month typically resulted in more than 5000 NCNA news items. Practically speaking, it meant more than 100 searches per year, compared to less than 40 per year in the early 80s. If I could perform SQL queries on the LexisNexis database, instead of using the crappy Web form, I could have had the same results in less than an hour.

Another issue relating to point #1: Every time I add a new set of search variables in the vertical column of my spreadsheet, I am adding at least two hours to the amount of time I have to perform research, because each additional variable has to be tested for each month over a 16-year period, or N x 12 x 16. That's about two hours per variable. With seven variables under study (V, K, L, S, U, I, A) the permutations number in the hundreds. Where do I stop?

At first, I identified about crucial 35 variables and combinations, but after considering the time it would take me to complete all 35 x 16 x 12 searches, whittled it down to 23. But I have to consider the end results of the dropped searches -- the data I get from these extra searches could yield significant patterns or other trends that shift the tone of my thesis.

OK, it's 11 pm, time to hit the sack. I'll release another progress report this weekend.

Tuesday, December 27, 2005

Creating a census of NCNA news items

I am spending a good deal of my Christmas-New Year vacation doing the repetitive yet necessary searches and data entry tasks related to creating a census of New China News Agency (新華社) news items between mid-1978 and mid-2003. (Read this for a rough explanation of my current research plan)

I am not using any sampling techniques. My sample is the entire population of NCNA news items -- articles, features, editorials, briefs, summaries; but not exchange rate tables or technical broadcasts sent over the NCNA wire. It basically entails counting every NCNA news item for each month in the period under study by using Lexis Nexis Academic, and then inputting the results in an Excel spreadsheet, where simple calculations and statistical analysis may be conducted when I start to count individual content variables (i.e., the names of countries in Indochina, the U.S., USSR, ASEAN and the UN).

This process is unfortunately very manual, mainly owing to the limitations of the LexisNexis tool. It's simply not designed for the type of research I am conducting. A major problem is the error message that shows up every time a search returns more than 1000 hits. This means I have to break up each month into three segments (up to 1985) and six segments (in the late 1980s) and possibly more if the NCNA output continues to rise. In 1985 there must have been a shift in production and resources -- perhaps in conjunction with the oft-cited campaign to make NCNA a "world news agency" -- because output roughly doubled from 1839 items in February 1985 to 3358 in June 1985. By late 1986 monthly output was topping 5000 news items.

Besides the obvious impact on the frequency of my variables in NCNA news items, practically speaking it means I have to spend more time conducting searches. The 1980s will take more than 500 individual searches to create an accurate census of news items. I can conduct about 150 searches and enter the results in the Excel tables per hour -- meaning I still have many hours to go before I finish up with the NCNA census, and can start counting my content variables singly and in combination.

Wednesday, December 21, 2005

Harvard's rock connections

I've been a Weezer fan since the blue album, and was lucky enough to see them perform in Taipei in 1997, a year after Pinkerton was released.

It turns out that Rivers Cuomo, Weezer's singer, rhythm guitarist, and songwriter is a Harvard student, according to this Boston.com article. I knew that he attended Harvard College in the 1990s, but I am not sure he has returned to finish that degree or is in a graduate FAS program -- the Boston.com article doesn't make it clear.

Other rock/Harvard connections include Tom Morello, the guitarist for Rage Against the Machine and Audioslave, Ben Deily of the Lemonheads, Clay Tarver of Chavez, and Jacob Slichter, drummer for Semisonic (and author of So You Wanna Be A Rock Star.).

Monday, December 19, 2005

Thesis update, 12/19/05

I've been working on my thesis planning since my last post, and thinking of how I need to rework the variables that will be studied in my computer-assisted content analysis of New China News Agency (新華社) articles.

A major change from the last iteration of "the plan" is I am dropping the study of specific issues -- namely, China's territorial dispute with Vietnam in the South China Sea, and issues surrounding Vietnam's treatment of ethnic Chinese. These issues are not only dissimilar to each other, but are also not comparable with country-based content variables like Kampuchea or the Soviet Union. Parsing these variables gets complicated, and I believe will lead to additional questions that lie outside the scope of my original hypotheses.

I am not dropping the idea of employing a methodology that uses NCNA as a barometer of official views of Vietnam, but I am shifting to a different set of variables, and increasing the length of the period under study. The new plan is to concentrate on the main country variable under study, Vietnam, plus two other regional country variables: Kampuchea, and Laos. I will be measuring the correlation between these variables, as well as between superpower variables (Soviet Union/Russia, and the United States) and premier global and regional groupings -- the UN, and ASEAN. (To this end I spent the better part of the weekend building and testing search strings in LexisNexis Academic for all country and regional grouping variables. I'll talk more about this in a separate post -- not only was it an interesting process, but also it revealed significant problems with one variable.)

Additionally, I will extend the period under study from just the Deng Xiaoping (鄧小平) era to the Jiang Zemin (江澤民) era as well as Deng. This adds more than 10 years to the timeline, and several thousand additional searches in LexisNexis. Yes, it's a pain, but the results will add an additional basis of comparison which can create many extra opportunities for analysis and insight.

There is a ALM thesis writers' meeting tomorrow, on the 20th, at 51 Brattle Street. I'll see if others at the meeting have any suggestions about my revised approach.

Tuesday, December 13, 2005

LA Times editorial on China's PR problem

Excellent editorial in the Los Angeles Times today, entitled "China's PR Problem." The gist of the editorial is summed up in the first sentence:
There's something about the international spotlight that spurs Chinese officials to admit wrongdoing, raising doubts about whether they would have done anything without the glare.
But this points to a larger problem for China: Being forced to respond an issue because it can't control the message.

Chinese authorities can act like despots -- if they so desire -- because they control people's access to the facts about environmental disasters, police brutality, prison camps, political arrests, and other embarrassing incidents or policies. Even when the facts get out by word of mouth, control of domestic media coverage means they can sweep things under the rug, or prevent outrage from spreading.

Or can they?

The answer increasingly seems to be, "no, they can't!" It's not just transborder incidents like the environmental catastrophe on the Songhua River, or Hong Kong demonstrations, or negative international press coverage that force China to respond. It's the fact that the Chinese government is losing control over information within its own borders, thanks to widespread use of the Internet, mobile telephones, and an increasingly prosperous and demanding citizenry that is willing to speak out against government incompetence, cruelty, and corruption. The Chinese press is increasingly outspoken, too.

And these trends will continue to grow, causing more embarrassment and problems for the government there. There's not much China's government can do about it -- except, of course, reform itself at all levels, and overhaul the country's legal system to give real rights to its people.

Saturday, December 10, 2005

Fairbank conference on studying modern China

I just returned from the Fairbank conference at Harvard: "Studying Modern China: Past, Present, and Future".

Despite yesterday's weather, and the piles of snow still on Cambridge's streets, the sessions were packed -- standing room only. It's possible the CGIS building didn't have a bigger lecture room, but I felt for such a high-profile academic field (or group of fields) open to the public, this event should have taken place in a room that had twice the capacity.

There were a lot of luminaries on the program and in the audience. I recognized Ross Terrill, Merle Goldman, and a few others. Unfortunately, I couldn't see all of the very intriguing-sounding sessions over the three-day conference, just two this morning: panels on Chinese Domestic Politics and China's International Relations. I have to admit that I didn't hear anything particularly earth-shattering in either of them. As Alastair Iain Johnston noted about current scholarship on international relations theory as it involves Chinese foreign policy, there are a limited number of full-time academics that specialize in this field, and one tends to see the same names over and over again. I've already read many of them, as well as the relevant issues they deal with. I only felt the need to take one page of notes combined from the two sessions.

Interestingly, two issues which were only mentioned in passing were the impact of propaganda on domestic Chinese politics and foreign policy, and the Internet's social impact among urban Chinese. True, there are only a limited number of issues that can be mentioned in two-90 minute sessions, but in my view (admittedly greatly biased in favor of my own academic interests) these topics should have been addressed in depth. Maybe they were at a later session?

Thursday, December 08, 2005

WaPo's Shanghai "Live Discussion" -- a model for Harvard?

The Washington Post online has an excellent resource called "Live Discussions". They are basically moderated online discussions with experts on a variety of subjects. You use a form to submit a question to the expert, and the expert replies (of course, depending on time, expertise, and willingness to respond). Users reload the web page to see the new questions and responses. The Post have been doing these for at least a few years, but have greatly increased production -- there are now several Live Discussions every day.

This morning they had an interesting session with Peter Goodman, the Post's staff writer in Shanghai. I was able to have two questions answered by Peter, the transcript of which I am including below:
Waltham, Mass.: What examples have you observed of Internet communications -- email, websites, discussion forums, etc. -- undermining local, provincial, or central government authority in China?

Thanks, Ian

Peter S. Goodman: Check out my colleage Ed Cody's recent work dissecting peasant revolts, in which he has shown how text messaging on mobile phones has been key in organizing people, with farmers running to confront police as they arrive to break up actions aimed at protecting land against development. In Shanghai, people arguing that they are getting ripped off by developers who have pushed them off land without fair competition have been able to research and organize via Web sites and e-mail. Of course, they also leave a trail that then allows the government to find them and shut down the leaders. This is a key question here: Is technology a tool of subversion or a tool of repression wielded by the state? Obviously, it's both, though I'd argue a little more of the former.

...

Waltham, Mass.: I've seen reports of thousands of "mass incidents" across China. Are these mostly taking place in rural or poorer industrial areas, or even in the economically booming areas of the country ... Shanghai, Beijing, Giangzhou, Shenzhen, Xiamen, etc.?

Peter S. Goodman: They seem to be happening all over, including -- and maybe especially -- in booming coastal areas, where land values are at a premium. Generally, these uprisings are over land use, with local officials turning farm acreage into golf courses, factories, science parks, villas. The villagers, many of whom have seen incomes slip during China's boom, demand compensation or an end to development or a limit to pollution. The local officials keep going and sometimes bring in police or goons for hire to break things up. If the stakes are high enough, things can get very ugly. Again, have a look at Cody's really excellent run of stories on this.
The apparent success of the Washington Post's Live Discussions makes me wonder: Perhaps Harvard could do something like this. The University certainly has the expertise across many subject areas. It already holds lectures and symposia that are open to the public -- why not take it a step further and open up this expertise to a wider online audience? It would be easy to promote it to alumni and members of the community first, and then start publicizing it on the Harvard homepage every week. The online environment isn't that hard to set up, the sessions would only be taking an hour of a professor's time (plus a short training session with the moderator), and they can answer the questions they feel like discussing, while avoiding the cranks or off-topic questions.

Wednesday, December 07, 2005

The Internet and press plagiarism in China

I've blogged before about ethical problems facing Chinese journalism, but there's another facet of the Chinese press that EastSouthWestNorth blog brings to my attention: Plagiarism.

ESWN translates an excellent Xinhua commentary by Yuan Bixia (原碧霞), entitled Plagiarism: How News Has Become 'True Lies' (抄袭盛行 新闻怎成"真实的谎言"). Yuan notes the rise of a journalistic culture which thrives on rampant copying from the Internet. While some people (usually the victims of plagiarism) are upset, the journalistic establishment in China seems to shrug it off as business as usual. From the translation:
... Since plagiarism is easy and convenient, and the resulting product is an "in-depth" report, the effort is small and the results are huge. By comparison, those who refuse to plagiarize are the weak ones in media. Today, some of them have begun to "learn" from their colleagues and mutual plagiarism is an open secret in the media.

"These days, there are some young people who just entered the business. They don't know how to gather news and they are too lazy to do it. When I pressed them for the reports, they just get on the Internet to 'dig'. There is no point in criticizing them, because this is hopeless. What are we going to do?" A certain friend who is in charge of strategic planning at a newspaper told his reporter.

Within the media industry, the "popular" types of plagiarism are classified as follows:

1. The most fair and open plagiarism: Copy the "press releases." When you attend various types of press conferences, you receive press releases. You do not dig for the news at the conference. You just sign your name and release the article.

2. The most notorious plagiarism: Copy from your peers. When you see a news item published for another place, you change the time and place to turn it into your local news.

3. The most undetectable plagiarism: Copy from a book. This type of situation appears in certain service-related news. Some reporters buy books about health and medicine and then change the contents of those books into news stories gathered by them.

4. The most realistic plagiarism: Copy from reportage and investigative reports. Someone else might have spent years to write an investigative report, but the reporter extracts one portion and presents it as original work.

5. The most "subtle" plagiarism: Hire someone to gather the news and sign your own name. Presently, in some places, the assigned reporters may hire temporary workers to gather the news and then sign their own names for publication.
I have to say that # 1 is fairly common in the English-speaking word, especially when product releases are involve. It's lazy and its wrong, but it happens, and editors don't seem to really care. #2 and #3 situations are rare, but they happen seemingly every week, and, if uncovered, usually result in someone losing his or her job.

# 4 is rare, probably because of the increased risk of getting caught, but it should be noted that it's quite common for high-profile investigative pieces to spawn follow-up or additional investigation by other media outlets, at least in the United States.

As for #5, five years ago I would have said "impossible" -- mainly because the idea of a reporter actually having full-time help was unlikely -- but have since found that "stringer abuse" does happen, as evidenced by the 2003 case involving Pulitzer-prize winning New York Times correspondent Rick Bragg.

The professional shame associated with plagiarism here, not to mention potential legal repurcussions, makes it a less common problem in the United States. But, like China, the ease of copying from the Internet makes it a tempting target for journalists who don't care, or have other motives.

Unfortunately, bad habits, once learned, are hard to abandon. As long as the Internet makes massive amounts of news available in an easy-to-copy format, plagiarism will continue to plague the profession in China and, to a lesser degree, in the United States.

Tunes for writing or studying

Follow-up on yesterday's post about Taiwanese bands. This one's about music to study to, or write to. Today's students use personal computers that have sophisticated entertainment functionality built in, and, as natural multitaskers, will use iTunes or similar programs to listen to music or other audio feeds while they work on papers or other research.

Personally, I like to listen to music when writing or studying as well. But not any music will do.

Pop songs, folk, jazz, rap, don't work. It either distracts or jars me. Ditto for dance or house -- the beat or arrangements pull me in, and I lose my train of thought. KLF and Jamiroquai are in my mp3 collection, but I almost never play them when I am trying to get serious work done.

I've found piano or classical guitar is good ... works performed by Narciso Yepes and Mikhail Pletniv are in my playlist.

Rock music is harder to listen to. Weezer? Lenny Kravitz? No way. Too many hooks. Ditto for most classic rock I have on disc.

What does work in the rock pantheon? I'm able to listen to Tool -- any of their first three albums -- and really get a lot of work done. It's long, kind of mathematical, yet alternates between quiet and loud, hard and ethereal. Nine Inch Nails, too, except for the stuff with hooks (Starf'ers Incorporated, etc.). Some Pavement as well. Mandarin or Taiwanese songs are good, because it's easy to tune out the lyrics (if you are a native English speaker, that is).

iTunes has a very good streaming radio feature that even works with dial-up modem, if you select the low-bandwidth streams. I prefer Radioiorock but there are lots of other good streams out there -- unfortunately, not too many foreign language streams.

Anyone else have good tunes to study by?

Monday, December 05, 2005

Taiwanese rock -- best bands?

OT post. I'm listening to one of the best Taiwanese rock bands on iTunes, maybe the first "band" that gained any real popularity in Taiwan in the 1990s. The band is 五月天, the album is 愛情萬歲, and the musicianship and lyrics are excellent throughout. If the first track, 為什麼 ﹣今日的愛情, was sung in English instead of Taiwanese it could have made some headway on U.S. airwaves.

Some of you may say, "五月天 are just a bunch of Mandopop crooners. And what about 五百 and China Blue?" My answer: 五月天 are definitely not some record label creation designed to appeal to teen pop fans. They formed on their own in college, worked themselves through the underground club scene in 1997 and 1998, and wrote their own songs. And the songs are great. It has a pop edge to be sure, but it is well-crafted and guitar-oriented ... the lead guitarist told me his biggest Western influence was Pink Floyd, for goodness sake!

五百 is in a class by himself. He came earlier, is a more versatile songwriter, and rocks harder. But while he and his band are a single musical entity, the music industry wanted to promote him as an individual "star," and therefore his image is different than that of 五月天, IMHO.

Additionally, 五月天 and 五百 and China Blue are just the tip of the iceberg. They happened to get famous. There are a lot of Taiwanese bands still slogging it out in the underground. Wikipedia gives a short rundown in English; from that list I recommend Clippers and LTK Commune as good starting points.

Fairbank conference on China studies

There's an interesting conference taking place at the Fairbank Center about China studies, "Studying Modern China: Past, Present, and Future." Here's the agenda. Fortunately, part of it takes place on the weekend so I'll be able to attend at least one session, "China's International Relations."

Sunday, December 04, 2005

Another "uh-oh" moment for my thesis

I spent the better part of the weekend reading The Content Analysis Guidebook, by Kimberly Neuendorf. It was originally assigned by Doug and Joe Bond in my ALM proseminar back in 2003, but I've revisisted it a few times since then. (Check out the book's official website, it's an excellent resource)

Anyway, rereading the entire book forced me to consider how I am going to treat variables in my research. In the field of computer-assisted content analysis (a subset of which is computer assisted text analysis, or CATA) this is a very rigorous process. Unfortunately, my simplisitic view of foreign policy issues as currently formulated cannot easily fit into a logical system of measurable dependent and independent source variables and corresponding message variables. I had been considering issues like "overlapping territorial claims in the South China Sea", "Vietnam's treatment of ethnic Chinese," "Vietnam's relations with Kampuchea", "Vietnam's relations with the Soviet Union" to be equal variables in terms of measuring them in NCNA coverage. But they cannot be. "Ethnic Chinese" is a single variable, but something like "relations with the Soviet Union" encompasses a mass of sub-issues, including military cooperation, economic aid, etc.

This led me to a fork -- should I break up "country" variables into smaller, more easily compared pieces? That would be problematic, I concluded. There are just too many of them over the 15-year period under study. They are not only are hard to catalog, but some may also not lend themselves to database searches and measurement.

Another option I am considering is simplifying my methodology to only measure countries (Vietnam, Kampuchea, and the two superpowers) and including Laos as a basis for comparison with Kampuchea in my research. As many NCNA articles' formats are corrupted in Lexis Nexis, and don't have segmented lead paragraphs, I would use the next best thing -- article headlines -- to identify the focus of articles about Vietnam, and then full-text searches of the other country-associated terms to create a matrix that reflect the importance to NCNA (and by extension, what Lampton calls China's "leading nucleus") of certain country-variables, and correlations between multiple country-variables.

I'll post an update later this week about how this plays out. I would like to get started on the actual number crunching before the next thesis-writers' meeting on the 20th.

Thursday, December 01, 2005

The English language is a minefield when computers get involved

Cambridge, we have a problem.

I just ran into a potential stumbling block with my research. Prior to carrying out the body of my quantitative research, I am testing out LexisNexis using various search terms singly and in combination, to see if any potential word usage problems crop up.

Why? There are two reasons. Number 1: LexisNexis does not recognize periods in searches or capital letters, even when enclosed with quotation marks. Therefore, searching for "u.s.s.r." and "ussr" will return the same results. But "u.s." looks like "us" to the system, and a search of articles mentioning "u.s.", as in "United States", will also return results for "us", the pronoun. This makes it difficult for me to include articles that say "U.S." without saying "United States".

Number 2: A computer system like LexisNexis does not understand what I want it to do unless I tell it exactly what to do. There are expected issues with multiple words applying to the same concept (for instance, Cambodia and Kampuchea). In most cases the operand "or" solves a lot of potential conflicts. For instance, firing up LexisNexis and entering into the New China News Agency catalogue the search string:

kampuchea or kampuchean or cambodia or cambodian or phnom or sihanouk or khmer

will turn up nearly every New China News Agency news item relating to the country if applied to the full text of every NCNA dispatch during the period under study. I have tested for obvious names or terms which might relate to the variable under study without explicitly mentioning that variable by its most common name in English. For instance, "Sihanouk" is included in my planned search string for Kampuchea, because there are a few dozen stories which mention King Norodom Sihanouk without mentioning the country or its people during the period from 1978 to 1992. "Khmer" covers articles which mention the Khmer Rouge or Khmer people.

But I encountered a big problem when testing for alternate terms in articles that mention the United States without saying "United States". Some terms, like "Washington", work out fine. But "America" or "American" are very problematic, because there are many stories in the NCNA catalogue which mention these words but have nothing to do with "United States" -- those stories that mention Central America, South America, Latin America, or North America. Remember, China during the Deng years (the late 1970s and 1980s and early 90s) still saw itself as a champion of the third world, a critic of the superpowers (then both heavily involved in Latin American conflicts), and an active counterweight to Taiwan's influence in the region. Thus, there are thousands of NCNA stories that discuss issues relating to the Americas but not the United States.

But there are also many NCNA items that relate to these countries as well as the United States. Additionally, LexisNexis maps certain terms -- most notably, the names of countries in Central America -- to the word "America" or "American". I found this out by reviewing lists of NCNA stories by testing various word combinations and exclusions, and there is no way to counter this -- it is hard-wired into the LexisNexis database by default.

Or is there? It is an issue I will need to address with my searches, or with an admission that one of my variables cannot be accurately searched in LexisNexis.