What is text mining and analysis?

Text mining is the use of computational methods to extract data from collections of unstructured or semi-structured text. This can be the text from prose, newspaper articles, survey responses, primary sources, journals, interviews and more. The goal of text mining is to discover & extract information or patterns hidden in text, often across large collections. In this process the text is transformed into data for quantative analysis.

There is a long research tradition in text analysis in the humanities and with the explosion in digital text, computational analysis methods have developed in fields including statistics, computer science, (computational) linguistics and library science. Distant reading (quantitative analysis) of a digitised text or corpus (a text collection) is a well known humanities term used for text mining and analysis methods.

All researchers regardless of discipline, methodology, or objective, can gain insights from text as data.

Case study: Six Degrees of Francis Bacon

Below is an example of an interactive visual exploration of English philosopher and statesman Francis Bacon and his network of associations. To do this, a group of researchers text mined personal names from the text of the Oxford Dictionary of National Biography and linked them using computational methods. Explore it at http://www.sixdegreesoffrancisbacon.com/.

Francis Bacon Network visualisation
Francis Bacon Network visualisation

Source: http://www.sixdegreesoffrancisbacon.com


Why use computational methods?

Text is considered the main form for “communicating, discovering and processing information” (Sinclair and Rockwell, 2016). Even popular non written forms of communication such as streamed videos are largely inaccessible without searching by keywords in titles or descriptions, or from text within transcripts.

Activity : explore why researchers use text mining and analysis methods

Explore some of the reasons researchers use computational methods to analysis text:

Let’s explore the workflows of text mining and analysis in the next lesson.

<-- BACK | NEXT -->