Corpus Linguistics in the Classroom

Corpus Linguistics Lesson Plan


The incredible processing power of computers and the ability to efficiently scan and digitise books offer new ways to scrutinise how we use language to communicate ideas and messages. With the help of software, such as Microsoft Word and some of its basic functions, we can search for patterns in any novel or measure the frequency of words in a poetry anthology. The data we collect could lead to innovative and interesting interpretations of writers’ techniques and our language choices in the real world.

This approach is known as corpus linguistics.

The following lesson plan will guide you through the process of gathering data and how you can explore the language used in popular music. Hopefully, we might gain some interesting perspectives on their construction, themes and representation of the world. You will also develop your use of mathematics and ICT skills.

Step One – Gathering Data

If every student in the class selects one pop song to include in the sample, you should have a large “body” of language to analyse. Of course, the lyrics need to be more than yeah, yeah, yeah! Anything over 5,000 words should be enough to provide a meaningful and significant set of results.

Copy and paste the lyrics into a Word document. To keep the songs neat and tidy, it might be worthwhile to “Select All” and format the lines, so the typeface and font are consistent throughout the document. These options are available from the “Home” ribbon at the top of the screen:

Corpus Linguistics Lesson Plan Step One
Corpus Linguistics Lesson Plan Step One

Step Two – Finding the Word Count

We are going to look at the frequency of certain words and then express them as a percentage. Therefore, you will need to find the total number of words in the sample. You can locate this information at the bottom left of the Microsoft Word interface:

Corpus Linguistics Lesson Plan Step 2
Corpus Linguistics Lesson Plan Step 2

If the figure is not available there, click on “Review” and the “Word Count” option should appear in the new ribbon. For this lesson plan, we are going to use the 6478 total from the image above.

Step Three – Word Frequency

To search for the number of times a word occurs in the sample, begin by selecting the “Find” function at the top right of the screen and identified by the magnifying glass symbol:

Corpus Linguistics Lesson Plan Step 3
Corpus Linguistics Lesson Plan Step 3

You can also the shortcut Ctrl+f to also open the navigation pane:

Use the navigation pane
Use the navigation pane

When you enter a word into the box, Microsoft Word will search the document for that particular query. In the following example, we searched for the word love and selected the results tab:

Search Results
Search Results

As you can see, the word “love” occurs 31 times in the sample. Practise using this function with any other words you think might be important.

Step Five – Percentages

To work out the percentage, take the number of times the word occurs in the sample and divide it by the total number of words. You then multiply it by 100. For our worked example, “love” appears 31 times so we divide that figure by 6487 and multiply the answer by 100.

0.5% might not seem statistically significant but it is interesting to note.

Search Tips

If you are looking for the frequency of the pronoun “I” and you simply type “i” into form, the navigation function will return every occurrence of the letter, even when it appears in the middle of words. To get around this problem, you can add a space before or after the letter: ” i “. In our worked example, this reduced the figure from an inaccurate 1698 occurrences to a more realistic 235.

Similarly, if you are looking to count the number of times “she” appears in the sample, you would need to cheek that words such as “shell” and “sheet” do not get included in your tally. You can add a space after “she” to exclude these words but you might miss instances of “she’s” or “she’d” so you just need to be careful.

Tasks – Identity

A lot of pop songs focus on the themes of love and relationships. Some of these lyrics will will have a romantic tone. Others might be more despondent or angst-ridden. However, it might be interesting to find the different words used to define the identities in the stories.

Search through your sample and complete the following table.

Word TypeExampleNumber%
Gender identifiersBoy
Girl / Girly


In terms of love songs, it might be interesting to find the proportion of pronouns in the sample because this might give some indication to what extent the songs focus on the characters.

Word ClassExampleNumber%
First person singular pronounsI / Me / Mine / My
Second person singular pronounYou
First Person Plural pronounsWe / Us / Our
Third Person pronounsThey / Their
Third Person Singular PronounsShe
Third Person Singular PronounsHe


There are some words and sounds in English that do not communicate an obvious meaning. For example, if you are talking to someone, you might use “eh” and “ah” to indicate to your listener that you are simply pausing before you complete a thought or sentence. These meaningless sounds are called fillers. The phrases “you know”, “yeah” and “like” are increasingly popular fillers used by young people.

What percentage of the lyrics is made up of these meaningless words and sounds?

Step Six Page Results

There are some easy criticisms of the approach we have taken to analyse the sample of song lyrics. Our results could be skewed by the number of songs collected or the narrow demographic used to select them for the study. One song could repeat a particular word a ridiculous amount of times and this might impact our ability to interpret the results.

You can check where our search results occur in the document by selecting the “Pages” tab in the navigation pane. If you find the majority of your results come from one or two songs, you will have to include that information in your conclusions.

Select the Pages Option

It is also interesting to work out if there are common themes in the lyrics. Do the songs share certain imagery? Are there any motifs used to describe the characters and their situations? Do the writers rely on obvious clichés? Again, the “Pages” tab can help.

In our search for “love”, the word appears on five different pages. After a quick check, it is easy to confirm they are five different songs. This result might be statistically significant.

You should try searching for these common images: night; dark; light; sun; rain; wind; high; low; eyes; and hands.

Step Seven Interpret the Data

  1. Is there any significance in the percentage of words classified as pronouns?
  2. What conclusions can you draw about the representation of people from the various pronouns used?
  3. What percentage of words could be classified as grammatical?
  4. Are there any common themes, especially with the images the lyricists employ?
  5. What conclusion can you make from these figures?
  6. Has this research changed your perspective on songs?
  7. Does this research have any implications for how you study and revise for your English examinations?

Download the File

If you would like to try this lesson plan but don’t have the time or opportunity to collect the sample material, you can download the Word document of music lyrics used create this guide. It is important to note that the songs were selected by a class of young girls – could this skew the results?

Learn More

Thanks for reading!