Examining Numbers & Dates in OpenRefine
When a table is imported into OpenRefine, all data is formatted as strings, basically a piece of text. We can sort column values as numbers, but this does not change the data type in a column from text to numbers. Rather, this interprets the values as numbers for the purposes of sorting but keeps the underlying data type as is.
We can, however, transform columns to other data types (e.g., number or date) using the Edit cells > Common transforms
feature. Here we will experiment changing columns to numbers and see what additional capabilities that gives us.
Be sure to close any facets you have enabled from the left panel before this step, so that we can be sure we are working on the complete dataset. You can close an existing facet by clicking the x
in the upper left of that facet window.
Date formatting
So far we’ve been looking only at ‘String’ type data. It is possible to treat numbers and dates as strings. For example in the Date column the data of capture is represented as a String. However, some operations and transformations only work on ‘number’ or ‘date’ type data, such as sorting values in numeric or date order. To carry out these functions we need to convert the values to a date or number first.
Activity - tranforming text to date formats
To transform cells in the Date
column to date format:
- Click the down arrow for the
Date
column - Select
Edit cells > Transform
- In the ‘Expression’ box type the GREL expression
value.toDate("dd/MM/yyyy")
- OK The values are now displayed in green and follow a standard convention for their display format (ISO 8601) - this indicates they are now stored as date data types in OpenRefine. We can now carry out functions that are specific to dates.
Let’s create another column with a different date format.
- On the
Date
column dropdown selectEdit column->Add column based on this column
. Using this function you can create a new column, while preserving the old column - In the
New column name
type “Formatted-Date” In the ‘Expression’ box type the GREL expressionvalue.toString("dd MMMM yyyy")
.
Other common transformations include changing letter case, data formats and more. Take a look at the other options with Edit cells> Common transforms
.
Numeric facets
Sometimes there are non-numeric values or blanks within a column which may represent errors in data entry. We can find these by using a Numeric
facet.
Activity - numeric facet
- Click the down arrow for the
Water Temp (C)
column - Select
Edit cells > Common Transforms > To number
(another easy method to transform data type) - Edit a few cells, replacing the numbers with text, such as ”abc”, and make a few other cells blank. Change the
Data type
totext
before applying. - Click down arrow again and select
Facet > Numeric facet
.
A histogram (a representation of the distribution of numerical data) opens in the left hand menu.
Notice that there are several checkboxes in this facet, Numeric
, Non-numeric
, Blank
, Error
.
Below these are counts of the number of cells in each category. You should see checks for Non-numeric
and Blank
if you changed or deleted some values.
- experiment with checking or unchecking these boxes to select subsets of your data.
- when finished exploring, close this facet by clicking the
x
in the upper left corner of its panel.
Note that closing a facet will not undo the edits you have made to the cells in this column.
- Use the
Undo / Redo
function to reverse these changes.