Use scripts to automate processes

A script is a series of programming steps to automate the execution of tasks.

How OpenRefine documents changes made to data

As you conduct your data wrangling, OpenRefine saves every change you make to the dataset.

These changes are saved in a format known as JSON(JavaScript Object Notation). You can export the JSON script and apply it to other data files. If you had 20 files to clean, and they all had the same type of errors (e.g., misspellings, leading white spaces) and all files had the same column names, you could save the JSON script, open a new file in OpenRefine, paste in the script, and run it to apply the changes to the new dataset. This gives you a quick way to clean all your related data.

Activity - save the data wrangling steps as a script
  • Open Undo / Redo tab
  • Select  Extract ...

Currently all the operations you have made to the dataset are highlighted within the script window.

  • Select the steps that you want to apply to other datasets by using the check boxes.
  • Highlight all the code in the right-hand panel with Ctrl & A
  • Copy the code from the right-hand panel and paste it into a text editor (like NotePad on Windows or TextEdit on Mac).
  • Save it as a plain text file
    • in TextEdit, do this by selecting  Format > Make plain text
    • and save the file as a  .txt file.

Import a script to use with another dataset

Let’s practise running the script you just created on a new dataset. We’ll test this on a raw, uncleaned version of the dataset we’ve been working with.

Activity - import and use a script on another dataset

Create a new project in OpenRefine using the QldSharkControlProgramCatch_2017.csv dataset you downloaded at the start of the workshop.

  • Click on OpenRefine Symbol (top left-hand corner of screen), to open the menu page
  • select Create Project tab
  • select  Get data from this Computer tab > browse button
  • browse to select the file QldSharkControlProgramCatch_2017.csv you saved to your Downloads folder.
  • either click Open or double-click on the filename to import it into OpenRefine.
  • click Next
  • give the project a different name
  • select Create Project
  • click the Undo / Redo  tab (left-hand menu) >   Apply
  • paste in the contents of  .txt  file you saved.
  • click  Perform operations button
Watch these steps in this video.

The dataset should now be the same as your other cleaned dataset.

For convenience, we used the same dataset. You could use this process to clean related datasets.

For example, you could apply your changes to data that you had collected over different time periods or data that was collected by different researchers (provided everyone uses the same column headings). The data in this file was generated from a database, so the column headings for subsequent data downloads should be the same.


<-- BACK | NEXT -->