+ - 0:00:00
Notes for current slide
Notes for next slide

Data Cleaning consists of but is not limited to the following data actions

  • import
  • export
  • merge data
  • handle missing data
  • standardize and normalize data
  • deduplicate
  • verify, enrich, and enhance

https://www.quora.com/What-steps-should-be-included-in-a-data-cleansing-process

Image Credit: John Little

OpenRefine, Part 1: Navigating, Faceting, Cleaning

Pragmatic Datafication - DSVIL 2018

John Little

2018-05-03

1 / 7

Data Cleaning

2 / 7

Data Cleaning consists of but is not limited to the following data actions

  • import
  • export
  • merge data
  • handle missing data
  • standardize and normalize data
  • deduplicate
  • verify, enrich, and enhance

https://www.quora.com/What-steps-should-be-included-in-a-data-cleansing-process

Image Credit: John Little

OpenRefine

  • Demonstration

  • Facets & Clusters

  • Split

  • Concatenate

  • Search & Replace

  • GREL

3 / 7

Now You Try It

  1. Exercise 1: Basic Transformations

  2. Exercise 2: GREL

 

 

Warning: slow processing for exercise 1.

Please note, the sample dataset for exercise 1 is large with respect to OpenRefine's default memory allocation. Using the standard OpenRefine installation, you will likely experience slow processing for this exercise. You can allocate more memory for your OpenRefine instance although this is not recommended during this training exercise. I know slow is painful but your patience will allow you to focus more on the data transformations and tasks as they take place. Later, refer to the official OpenRefine FAQ on allocating memory.

4 / 7

John Little

I am ...

John Little

Your Rfun host...

You can make Rfun with our resources for R and data science analytics. See the R we having fun yet‽ resource pages.

Duke Univesrity...

Data & Visualization Services

6 / 7

Shareable under CC BY-NC license

Data, presentation, and handouts are shareable under CC BY-NC license

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

7 / 7

Data Cleaning

2 / 7

Data Cleaning consists of but is not limited to the following data actions

  • import
  • export
  • merge data
  • handle missing data
  • standardize and normalize data
  • deduplicate
  • verify, enrich, and enhance

https://www.quora.com/What-steps-should-be-included-in-a-data-cleansing-process

Image Credit: John Little

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow