+ - 0:00:00
Notes for current slide
Notes for next slide
+> Data Request +
| |
| |
my application +> API <--> Target Server (application)
^ |
| |
+----------+

http://www.gratasoftware.com/the-value-of-an-api/

Image Credit: John Little API schematic

APIs & JSON Parsing

Pragmatic Datafication - DSVIL 2018

John Little

2018-05-30

1 / 14

Application Program Interface

A set of rules and protocols used to build a software application. In the context of Web Scraping an API is a method used to gather clean data from a website. (data not wrapped in HTML, Javascript, etc.)

  • Built for machine-to-machine interactions

  • Instructions for programs

API Schematic

2 / 14
+> Data Request +
| |
| |
my application +> API <--> Target Server (application)
^ |
| |
+----------+

http://www.gratasoftware.com/the-value-of-an-api/

Image Credit: John Little API schematic

Why Use APIs?

3 / 14

Why Use APIs?

  • Get data in BULK

    • Text Analysis projects
    • Frontier beyond curated datasets
    • Coerce or process your target data into more usable data structures by eliminating HTML
    • JSON and XML is easier to parse
    • Easier to generate tidy dataframes
3 / 14

Why Use APIs?

  • Get data in BULK

    • Text Analysis projects
    • Frontier beyond curated datasets
    • Coerce or process your target data into more usable data structures by eliminating HTML
    • JSON and XML is easier to parse
    • Easier to generate tidy dataframes

However, consider Intellectual Property issues

  • Is all web data is scrapable?
  • You can get shut out
3 / 14

Client / Server

  • Same as h2m but now m2m
4 / 14

Image Credit: Client / Server

Simulation...

  • Person enters a URL

Parts of URL

5 / 14

Simulation...

  • Person enters a URL

Parts of URL

  • Client & server negotiate handshake (dramatization...)
5 / 14

Simulation...

  • Person enters a URL

Parts of URL

  • Client & server negotiate handshake (dramatization...)

dramatization: good handshake

5 / 14

Image Credit:

  • Web Browser parses the HTML
6 / 14
  • Web Browser parses the HTML

happy parsing dance

6 / 14

Ever seen HTML before?

Image Credit: happy parsed dance

JSON

  • Javascript Object Notation is a language-independent data format
  • Currently the most common data data format for asynchronous client/server communication format
  • Consists of key-value pairs
{
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
},
{
"type": "mobile",
"number": "123 456-7890"
}
],
"children": [],
"spouse": null
}
7 / 14

m2m -- development

  • Make [OR] interface with the web API

  • Same as h2m but now m2m

dramatization...

8 / 14

m2m -- development

  • Make [OR] interface with the web API

  • Same as h2m but now m2m

dramatization...

dramatization: confused about the protocol

8 / 14

API Keys

  • Keep your Secret Key to yourself
  • Keys are used by the API provider for tracking application usage
9 / 14

Demonstration

  1. Import Data: https://raw.githubusercontent.com/libjohn/openrefine/master/data/sample-us-address-data.csv

  2. Make Full Address:

    • value + " " + cells["city"].value + " " + cells["state"].value + " " + cells["zip"].value
  3. Fetch from Google Geocoding API

    • 'https://maps.googleapis.com/maps/api/geocode/json?' + 'sensor=false&key=<<INSERT YOUR Google Console Key>>' + '&address=' + escape(value, 'url')'
      (See slide notes in presenter mode p)
  4. JSON Viewer: http://jsonviewer.stack.hu/

  5. Longitude: value.parseJson().results[0].geometry.location.lng

  6. Latitude: value.parseJson().results[0].geometry.location.lat
10 / 14

Demonstration Notes:

Need to Enable the Google Geocoding API

  1. https://console.developers.google.com
  2. Enable APIs and Services ; search = geocoding ; select API
  3. Enable
  4. Credentials tab

    • key is ALSO in my lastpass
  5. http://v.gd/parsing3333 -- OR --
    https://docs.google.com/document/d/1ZiHC1v595tf2NAhv4vVdRYy-Ro78Bc37Y0gs1TfGBco/edit
    Demonstrate JSON parsing with OpenRefine

Now You Try

  1. API and Parsing

  2. API with Keys

11 / 14

John Little

I am ...

John Little

Your Rfun host...

You can make Rfun with our resources for R and data science analytics. See the R we having fun yet‽ resource pages.

Duke Univesrity...

Data & Visualization Services

13 / 14

Shareable under CC BY-NC license

Data, presentation, and handouts are shareable under CC BY-NC license

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

14 / 14

Application Program Interface

A set of rules and protocols used to build a software application. In the context of Web Scraping an API is a method used to gather clean data from a website. (data not wrapped in HTML, Javascript, etc.)

  • Built for machine-to-machine interactions

  • Instructions for programs

API Schematic

2 / 14
+> Data Request +
| |
| |
my application +> API <--> Target Server (application)
^ |
| |
+----------+

http://www.gratasoftware.com/the-value-of-an-api/

Image Credit: John Little API schematic

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow