Filter and Sort


Filtering

There are many records in this dataset and you can filter them to work on a subset. Let’s explore filtering by a word within the text values of a column.

Activity - filter to work on a subset of data
  • Click the down arrow next to  Location > Text filter. A filter box will appear in the left margin
  • Type ‘bay’ and press return. There are 99 matching rows of the original 428 rows (and these rows are selected for the subsequent steps).
  • Limit the results to one of the locations with ‘Bay’ in the name. There are a couple of possibilities.
  • Do  Facet > Text facet  on the  Location  column after filtering. This will show 8 locations with names that match your filter.
  • To restrict to only one of these locations, select one to include.
  • Close the facet and filter.

Faceting and filtering look very similar. A good distinction is that faceting gives you an overview, description and count of all of the data that is currently selected, while filtering allows you to select a subset of your data by a string - of text in this case - for analysis or cleaning.

Watch this video on filtering

Working with columns

Removing columns

You can reorder the columns by clicking the drop-down menu at the top of the first column labelled All.

Activity - remove columns

The Number Caught column shows that each observation is about one shark, so this variable is reduntant and needs to be removed.

To do this:

  • Edit columns > Re-order / remove columns …
  • Drag Number Caught to the right hand side of the menu
  • Ok

Reordering columns

You can also drag and drop column names to reorder the columns.

Renaming columns

You can rename a column by opening the drop-down menu at the top of the column that you would like to rename, and choosing `Edit column > Rename this column’. You will then be prompted to enter the new column name.


Sorting data

Using Area variable, let’s explore data using sorting.

Activity - sort by text
  • In Area column select drop-down menu and Sort. Options include:
    • Sort by text
    • Sort by numbers
    • Sort by dates
    • Sort by booleans (TRUE or FALSE values).
  • Select text
  • Additional options will then appear for you to fine-tune your sorting, select a-z
  • You can also specify what order to put blanks and errors in the sorted results
  • Drag and move blanks and errors boxes to the top of the list.
  • Ok
  • The dataset is now sorted in alphabetical order by Area.

When performing the first sort, you will notice the drop-down menu for the selected column shows Sort ...

If you try to re-sort a column that you have already used, the drop-down menu changes slightly, to > Sort  without the  ..., to remind you that you have already used this column. Additional options include:

  • Sort > Sort ... - This option enables you to modify your original sort.
  • Sort > Reverse - This option allows you to reverse the order of the sort.
  • Sort > Remove sort - This option allows you to undo your sort.
  • Remove this sort.

Sort by multiple columns

You can also sort by multiple columns and this makes it easier to explore data easily before analysis and processing. Do this by performing a sort on additional columns.

The sort results will depend on the order in which you select columns to sort.

To restart the sorting process with a specific column, check the  sort by this column alone  box in the  Sort pop-up menu.

If you go back to one of the already sorted columns and select Sort > Remove sort , that column is removed from your multiple sort. If it is the only column sorted, then data reverts to its original order.

Activity - sort by multiple columns
  • Sort Area again, select  Sort... > text and a-z
  • Add an additional sort by Location, select Sort... > text and a-z
  • It is now easy to see an alphabetically list of all the locations in Bundaberg.
  • Remove sort

Sorts in OpenRefine are temporary. If you remove the Sort, the data will go back to its original unordered state.

The Sort drop-down menu at the top of the screen also lets you amend the existing sort to:

Find and fix missing values by sorting

Let’s try to find if there are missing values for the Species Code variable.

Activity - find & fix missing values by sorting
  • In  Species Code column, select  Sort... > numbers
  • in pop up box select smallest first and reposition Errors & Blanks to the top of the column.
  • Ok

Notice the first three values are missing. They are all associate witht the species name Tiger Shark. You may be able to identify the missing value by other variable attributes.

When you know your data or look at the variable in context with other similar observations you can identify what type of data problem might be occurring and possible solutions.

In this case, you could search other variables such as Species Name to identify correct Species Code details.

  • Keep the current sort on Species Code then
  • Go to Species Name, Edit > Text Facet
  • Scroll down to and hover over Tiger Shark then click include

It is now possible to see the missing code 37018022 which matches the species name Tiger Shark

  • Highlight and copy Cntr C the Species Code
  • Hover over empty cell Edit
  • Paste Species code in popup box
  • Select Apply to All Identical cells button. This will fill all the empty cells with the same value.

Notice the yellow Mass Edit notice listing how many records were changed.

Watch the steps to sort data in this video.

<-- BACK | NEXT -->