I am spending a good deal of my Christmas-New Year vacation doing the repetitive yet necessary searches and data entry tasks related to creating a census of New China News Agency (新華社) news items between mid-1978 and mid-2003. (Read this for a rough explanation of my current research plan)
I am not using any sampling techniques. My sample is the entire population of NCNA news items -- articles, features, editorials, briefs, summaries; but not exchange rate tables or technical broadcasts sent over the NCNA wire. It basically entails counting every NCNA news item for each month in the period under study by using Lexis Nexis Academic, and then inputting the results in an Excel spreadsheet, where simple calculations and statistical analysis may be conducted when I start to count individual content variables (i.e., the names of countries in Indochina, the U.S., USSR, ASEAN and the UN).
This process is unfortunately very manual, mainly owing to the limitations of the LexisNexis tool. It's simply not designed for the type of research I am conducting. A major problem is the error message that shows up every time a search returns more than 1000 hits. This means I have to break up each month into three segments (up to 1985) and six segments (in the late 1980s) and possibly more if the NCNA output continues to rise. In 1985 there must have been a shift in production and resources -- perhaps in conjunction with the oft-cited campaign to make NCNA a "world news agency" -- because output roughly doubled from 1839 items in February 1985 to 3358 in June 1985. By late 1986 monthly output was topping 5000 news items.
Besides the obvious impact on the frequency of my variables in NCNA news items, practically speaking it means I have to spend more time conducting searches. The 1980s will take more than 500 individual searches to create an accurate census of news items. I can conduct about 150 searches and enter the results in the Excel tables per hour -- meaning I still have many hours to go before I finish up with the NCNA census, and can start counting my content variables singly and in combination.