Saturday, November 26, 2005

Testing LexisNexis functionality

As I mentioned in my last post about my thesis progress, I am tackling the hard-core data gathering and computation first before I return to crafting my proposal. I spent about three or four hours today formulating my search terms in LexisNexis, and carrying out sample tests. Unlike my research paper earlier in the year when I used the LexisNexis Academic GUI to formulate my searches, this time I am going to use the more primitive interface. The latter uses the GUI, but only a single field with a combination of search terms and special search words. It takes a little getting used to, but it allows me to save common searches in a separate text file, thereby reducing the chance of error when using the GUI's drop-down lists for article segments (headlines, bylines, etc.) and boolean search terms.

Learning the primitive interface also let me learn a lot about LexisNexis and its functionality, including some abilities not available with the GUI. For instance, the following search:

hlead ( Thai w/12 "boat people" )

This tells LexisNexis to scour the database for stories that have, in the headline or lead graf, the phrase "boat people" within 12 words of "Thai". It cannot be completed with the GUI, because w/n is not completely supported -- only a few numbers rounded to five.

I discovered some problems with the Xinhua data, as well. Much of it is not formatted into seperate paragraphs, meaning I cannot use searches that focus on the lead paragraph. This is a blessing and a curse. A blessing: Sometimes NCNA stories dig at Vietnam or other countries in stories that are ostensibly about some unrelated topic, so these results will be included, even if the terms show up at the end of the article. A curse: These searches will turn up some results which only refer to the terms in passing or as background, which I would rather not have to deal with.

No comments: