Introduction to text mining and analysis
In this self-paced workshop you will learn steps to:
- Build data sets: find where and how to gather textual data for your corpus or data set.
- Prepare data for analysis: explore useful processes and tools to prepare and clean textual data for analysis
- Analyse data: identify different types of analysis used to interrogate content and uncover new insights
At the end of these lessons you should be able to:
- Implement a basic workflow for researching with digital text
- Identify different types of textual data and usage considerations
- Find textual data for digital analysis
- Choose the best tool for your dataset at each stage of the digital research process.
Throughout these lessons you can download and play with software and datasets, and try out simple data cleaning and analysis activities. There are liks to further suport and videos to guide you through the lessons.
Want to text mine via webscraping? Check out these Web Scraping with Python lessons.
Contents:
- What | topics: What is text mining; Why use it
- How | topics: Activities; Considerations; Workflow steps
- Rights | topics: Getting ethical clearance; Licenses and access agreements; Copyright
- Build | topics: Building a dataset; Finding data;
- Prepare text | topics: Prepare and format; Useful tools; Cleaning text
- Audio Video | topics: Interview transcription
- Analyse | topics: Processing methods; NLP; Common tasks; Useful tools; Useful platforms; Codings
- Visualise | topics: Requirements; Types of visualisations;
- References
Griffith University acknowledges the people who are the traditional custodians of the land and pays respect to the Elders, past and present, and extends that respect to all Aboriginal and Torres Strait Islander peoples.
Theme: workshop-template-b by evanwill is built using Jekyll on GitHub Pages. The site is styled using Bootstrap.
Copyright: © 2022 Griffith University. Apart from Griffith logos or 3rd party material used with permission or under another license, this material is licensed under a CC BY-NC 4.0 license
Contributors: Benjamin McRae, Sharron Stapleton, Yuri Banens, Antony Ley
Get source code for this online workshop.
Griffith University - CRICOS Provider Number 00233E