What is text mining and analysis?
Text mining is the use of computational methods to extract data from collections of unstructured or semi-structured text. This can be the text from prose, newspaper articles, survey responses, primary sources, journals, interviews and more. The goal of text mining is to discover & extract information or patterns hidden in text, often across large collections. In this process the text is transformed into data for quantative analysis.
There is a long research tradition in text analysis in the humanities and with the explosion in digital text, computational analysis methods have developed in fields including statistics, computer science, (computational) linguistics and library science. Distant reading (quantitative analysis) of a digitised text or corpus (a text collection) is a well known humanities term used for text mining and analysis methods.
All researchers regardless of discipline, methodology, or objective, can gain insights from text as data.
- Communication scholars collect and analyse news texts and scrape social media feeds
- Qualitative sociologists process and analyse hundreds of hours of interview transcripts
- Political scientists analyse collections of speeches and parliamentary transcripts
- Engineering researchers text mine and examine accident reports
- Historians access primary source materials from online repositories enabling broader perspectives and new research
- Literary scholars work with digitized single texts, translated versions of text, whole corpus or person-of-interest collections
- Environmental scientists undertake network analysis of research literature
- Medical researchers analyse the text of electronic patient records
- Legal scholars examine large corpora of case law and legislation
Below is an example of an interactive visual exploration of English philosopher and statesman Francis Bacon and his network of associations. To do this, a group of researchers text mined personal names from the text of the Oxford Dictionary of National Biography and linked them using computational methods. Explore it at http://www.sixdegreesoffrancisbacon.com/.
Why use computational methods?
Text is considered the main form for “communicating, discovering and processing information” (Sinclair and Rockwell, 2016). Even popular non written forms of communication such as streamed videos are largely inaccessible without searching by keywords in titles or descriptions, or from text within transcripts.
Explore some of the reasons researchers use computational methods to analysis text:
- increase validity
- repeat processes and analysis on other text or corpus
- enable broader questions of larger corpora
- help understand texts and underlying social and cultural phenomena at scale
- expand textual studies with temporal or geographical context
- create visual exploration of text.
- to undertake statistical analysis of text
- to gain insights from previously untapped data.
Let’s explore the workflows of text mining and analysis in the next lesson.