I wrote a chrome extension to scrape college admission data.

Recently, I am working on a project about training a model to predict the result of college admission. Me and my teammate, we are searching for data from https://www.collegedata.com. Unfortunately, the website doesn’t offer data downloading. So I started to write a program to scrape the data.
My first implementation was using Cheerio (https://github.com/cheeriojs/cheerio), a NodeJS framework that is good at parsing and searching HTML nodes.
However, I need to hand typing a bunch of variables as inputs to my program. And for my teammate, it was not convenient for him to use. He needed to install NodeJS and know some basic knowledge of how NodeJS works.

My second implementation was to create a chrome extension and use jQuery to scrape the data. Then, I packed the extension and sent it to my teammate to install and use it.

Since it is not a commercial extension. I was keeping things simple.

Files

  • manifest.json: All the configuration of your extension.
  • content.js: My code that creates UI and scrapes the data.
  • FileSaver.js: Use it to save the data as a file. (https://github.com/eligrey/FileSaver.js/)

manifest.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"manifest_version": 2,
"name": "Collegedata-Srcaper",
"version": "0.1",
"content_scripts": [
{
"matches": [
"https://www.collegedata.com/*"
],
"js": ["jquery-3.3.1.min.js","FileSaver.js", "content.js"]
}
],
"browser_action": {
"default_icon": "icon.png"
}
}

For “js” section, the order is important. Chrome loads them sequentially. So, the libs need to load first.

Install and pack your extension

Go to chrome -> extension. Toggle the developer mode. Then load unpacked extention.


Screenshots

After installing the extension, in the https://www.collegedata.com website, a floating bar appears on the top left corner.

The extension works when there are data on the page. Otherwise, it just alerts you “No data found”.

So, go the admission tracking page, and type in the university and years that you want to look at. Here, I typed in Stanford, and year from 2010 to 2020.

Once the data are retrieved from server. Click “Download” button. Then, save the data.csv file to your desktop.

Open with Office Excel.

Bravo!!!

Link to the code: https://bitbucket.org/JunchengHan/collegedata-scraper-extension/src