Efficient Methods for Sampling Responses from Large-Scale Qualitative Data

Citation
N. Singh, Surendra et al., Efficient Methods for Sampling Responses from Large-Scale Qualitative Data, Marketing science , 30(3), 2011, pp. 532-549
Journal title
ISSN journal
07322399
Volume
30
Issue
3
Year of publication
2011
Pages
532 - 549
Database
ACNP
SICI code
Abstract
The World Wide Web contains a vast corpus of consumer-generated content that holds invaluable insights for improving the product and service offerings of firms. Yet the typical method for extracting diagnostic information from online content.text mining.has limitations. As a starting point, we propose analyzing a sample of comments before initiating text mining. Using a combination of real data and simulations, we demonstrate that a sampling procedure that selects respondents whose comments contain a large amount of information is superior to the two most popular sampling methods.simple random sampling and stratified random sampling.in gaining insights from the data. In addition, we derive a method that determines the probability of observing diagnostic information repeated a specific number of times in the population, which will enable managers to base sample size decisions on the trade-off between obtaining additional diagnostic information and the added expense of a larger sample. We provide an illustration of one of the methods using a real data set from a website containing qualitative comments about staying at a hotel and demonstrate how sampling qualitative comments can be a useful first step in text mining.