ENTROPY OF ENGLISH TEXT - EXPERIMENTS WITH HUMANS AND A MACHINE LEARNING-SYSTEM BASED ON ROUGH SETS

Citation
H. Moradi et al., ENTROPY OF ENGLISH TEXT - EXPERIMENTS WITH HUMANS AND A MACHINE LEARNING-SYSTEM BASED ON ROUGH SETS, Information sciences, 104(1-2), 1998, pp. 31-47
Citations number
27
Categorie Soggetti
Information Science & Library Science","Computer Science Information Systems
Journal title
ISSN journal
00200255
Volume
104
Issue
1-2
Year of publication
1998
Pages
31 - 47
Database
ISI
SICI code
0020-0255(1998)104:1-2<31:EOET-E>2.0.ZU;2-B
Abstract
The goal of this paper is to show the dependency of the entropy of Eng lish text on the subject of the experiment, the type of English text, and the methodology used to estimate the entropy. Claude Shannon first described the technique for estimating the entropy of English text by a human subject guessing the next letter after viewing a string of ch aracters taken from actual text. We show how this result is affected b y using different humans in the experiment (Shannon used only his wife ) and by using different types of text material (Shannon used only a s ingle book). We also show how the results are affected when we replace the human subjects with a machine learning system based on rough sets . Automating the play of the guessing game with this system, called LE RS, gives rise to a lossless data compression scheme. (C) Elsevier Sci ence Inc. 1998.