RocketReader
Improve Reading Skills
Enhance Speed and Comprehension
Expand Vocabulary


Any questions?
contact support

RocketReader Awards
RocketReader Awards

RocketReader Vocab Awards
RocketReader Vocab Awards

view all

RocketReader Readability (Reading Level) - FAQ

RocketReader is currently collaborating with the team at Project Gutenberg to supply Reading Levels for all of the English books in the Gutenberg collection. The Gutenberg FAQ is located here.

The code (as well as Windows and Linux binaries) is available as a sourceforge.net project here under the LGPL license. New: code was uploaded 11 Dec 06.

Q. What are the Benefits of Reading Level?

Reading level helps to answer the following popular questions:
  • How hard is a book to read?
  • Where may I find graded reading lists?
  • What are the hardest books in the collection?
  • What are the easiest books in the collection?
  • What books would be suitable for my children?
  • What Authors should I read that are at my reading level?

Q. What is Reading Level?

woman overwhelmed by the amount of emails in her in box The reading level is a way of designating the reading difficulty level for English text books in a collection. Reading level is given a rating of 0 through to 100. The hardest book in the collection has a readability rating of 0. The easiest book in the collection has a readability rating of 100.

Q. Who uses the RocketReader Reading Levels

The RocketReader reading levels have been used on the stories within the RocketReader software since 2003. During 2005 and 2006 they have been ehanced to include more reading factors and to normalize the results against English books in the Gutenberg collection.

Q. How is Reading Level Calculated?

The readability calculation is complex. It takes into account 12 qualities of reading and uses statistics to spread the scores of books in the collection uniformly between the values 0 and 100. More details can be found here.

Q. How are the Reading Levels distributed?

The reading levels start at zero (easiest) and finish at 100 (hardest). The books in the collection are distributed uniformily accross this range. For example, one quarter of the ebooks in the collection exists in the top quartile, ie. in the range of scores 75 through to 100. As another example, the books with reading levels zero to 10 represent one tenth of the collection.

Q. Why did the reading level for a given book change slightly from the value it was one month ago?

The reading levels for individual books can drift very slightly over time. This is because the reading levels are adjusted statisticly to cover the reading range in a uniform fashion. This property of uniformity makes it very intuitive when doing searches, for example, if we want the easiest 5% of the books we simply put a search in the range zero to 5. However, as the book collection grows the distribution properties of the entire collection changes over time. This means that the reading level of a book may drift around slightly over time.

Q. What is the most difficult English book in the Gutenberg collection?

Currently, according to the RocketReader Reading Level, the most difficult book in the Gutenberg collection is:

"Note on the Resemblances and Differences in the Structure and the Development of the Brain in Man and Apes" by Charles Darwin.

Q. What is the easiest English book in the Gutenberg collection?

Currently, according to the RocketReader Reading Level, the easiest book is:

"Mary Oliver: a Life" by May Sinclair. Also a "Bunny Rabbit's Diary" by Mary Frances Blaisdell comes a close second.

Q. What factors make a book hard to read?

girl reading The RocketReader Reading Level uses the following determinates
  • the prevalance of large words
  • the prevalance of short words (-)
  • the average number of words in a sentence
  • the average number of syllables per word
  • the prevalance of digits or numbers
  • the prevalance of the most common 1000 words in the English language (-)
  • the prevalance of compound sentence clauses (sentences containing commas)
  • the average number of words per paragraph
  • the difference between the frequency distribution of the letters in the ebook compared with a "typical" English frequency distribution
  • the difference between the frequency distribution of the first letter of each word compared with a "typical" English frequency distribution of first letters.
  • the number of distinct unique stemmed words divided by the number of words in the document
  • the occurance of profanity - this increases reading level to ensure that books with profanity are rated higher

Q. Why use these factors?

mother and son reading a book together Some of the factors used to determine the RocketReader Reading Level have been used in different reading metrics for many years to classify the readability of text. However, there are many new reading factors. It it somewhat subjective what makes a book more or less difficult to read. A way of checking whether the reading factors generally agree with each other is to calculate what is referred to as a cross correlation between the factor values over the reading corpus, eg the Guterberg collection. This has been done and there is general agreement (positive correlation) between the 12 reading factors except one; the prevalance of profanity. This factor was included simply to ensure that books with profanity were bumped up in reading level to ensure that children working through the "easiest to read books" would not accidently stumble across profanity.

Q. Why Not Use Well Known Reading Levels Such as Fog, Kincaid, SMOG, ARI or ____ (name popular reading metric)?

Most existing reading levels only use two to four determinates in calculating a reading score. The RocketReader Reading level uses twelve. These extra determinates are important especially when rating books of the lower grade levels. Additionally, the RocketReader Reading Level has nice statistical properties: Each factor contributes to the overall score in exactly the percentage allowed by each factor weighting. Within the main corpus (20k books in the Gutenberg collection) the scores are distributed uniformly throughout the numerical range of the metric (0 to 100).

Q. I disagree with the rating on a specific ebook!

We have found some books contain formatting that throws off the reading level. From time to time we have adjusted the parsing of the reading text level to cope with these special cases. However, we acknowledge that there still may be further special cases to be considered. Send us an email with the title of the book and the reason you believe it was misclassified. You can contact RocketReader here and the Gutenberg Project here.

Q. What languages are rated?

The Reading Level is designed to classify English books. A number of of the reading level determines are tied to English properties such as common abbreviations, frequently appearing words, common suffixes, methods of stemming words, syllablization, punctuation and frequency distributions of iall letters and the first letter of each word. For this reason the reading level is only currently provided for English ebooks.

Q. How many books have the Reading Level?

Currently, 15,114 English books in the Gutenberg collection carry the Reading Level.

Q. You silly scientists with your crazy statistics! Why reduce something as complex as a work of literature down to a single reading level?

Man reading his emails Simply, because it is useful. When children learn to read, it is great if they can advance progressively through gradually more difficult books. This RocketReader Reading level allows readings to be selected on this basis. It also provides many other uses and benefits to educators,researchers and adults!

Q. How long does it take to generate reading levels?

The reading level calculator classifies around two ebooks per second on a single processor computer. With the Gutenberg collection for instance, it takes one hour and fifty minutes to classify over 15,000 ebooks. The calculation methods are optimized for speed.

Q. Can I rate my own texts and books

Yes, the code is located as a sourceforge.net project here under the free LGPL license. The code is free for personal, educational and corporate use under the terms of the LGPL.
 
© 1996-2009 RocketReader     About | Contact | Press Releases | Affiliate | Links | Privacy | Site Map |

Reading Tips: ‘…We may not all be slam-dunking sensations like Shaq, but stamina is important in a key area in our lives: our ability to read well.…’  read article