Last Update: March 22, 2022
BY eric
Keywords
Overview
In short, Mr. Table is a chrome browser extension that can help extract data from table(s) (e.g. <table>*</table>
) of the web pages, and the extracted data can be saved into either "csv" or "json" format.
We often have needs to collect data from the Internet for our work or our study, however, data presented in the web pages are often not in the format that we want. For example, most data in the web pages are presented using HTML tag <table></table>
or <div></div>
, but we want data can be processed by our programs or our tools (e.g. Excel).
With "Mr. Table", data can be converted from what you can see from the web pages to the format that we can actually use.
To get 'Mr. Table' Chrome extension, please visit this link.
Table Types
HTML Table
Data often is presented using HTML table and related tags in the following ways:
- table for the table
- <thead> for the table column names
- <tr> for the table header row
- <th> for the table header cell
- <tbody> for the actual data
- <tr> for a data row
- <td> for a data cell
Table with Pure CSS
Data also often is presented using CSS, and data is grouped in <div>
tags and styled with CSS classes, for Example:
<ol>
<!-- Column header is the first item of list -->
<li>
<div>#</div>
<div>
<div>ID</div>
<div>Name</div>
<div>Age</div>
</div>
...
</li>
<li>
<div>1</div>
<div>
<div>John Doe</div>
<div>23</div>
</div>
...
</li>
<li>
<div>2</div>
<div>
<div>John Smith</div>
<div>37</div>
</div>
...
</li>
...
</ol>
Usage
'Mr. Table'
Installation
First, visit the chrome store, and click on 'Add to Chrome'.
Once you have have the extension installed you can pin 'Mr. Table' to the toolbar, then you can see the icon like this: .
Extension UI
After opening a web page containing the data you would like to extract, then click on the 'Mr. Table' icon, then you can see a popup window as follows:
There are a few options:
- You can select the output file type: CSV or JSON
- If you click on the "Advanced Options", you can see a list of selectors:
- In some cases there might be many tables in a single page, so if you will like to only extract only the one you like, you can highlight that particular table, then click on "Export Selected".
- To make things simpler, you can just click on "Export All".
For example, I would like to extract the code change tables from ASX website for my algorithmic trading system, and I would go to this page.
After extracting all tables, you can see something like the following:
Then you can download each table individually.
Export Data from the <table>
Table
For those data presented in HTML table tag, you can simply using the default settings with preset table selector, column selector, cell selector, etc. as it can been found in the advanced options:
Export Data from CSS Table
Unfortunately, for such tables you can only extract them by specify the selectors manually.
Below is a table using pure CSS:
The following is the corresponding code in HTML:
<div class="table">
<div>
<div class="table-header">
<div class="table-header-cell">ID</div>
<div class="table-header-cell">Name</div>
<div class="table-header-cell">Age</div>
</div>
</div>
<div>
<div class="table-row">
<div class="table-cell">1</div>
<div class="table-cell">John Doe</div>
<div class="table-cell">23</div>
</div>
</div>
<div>
<div class="table-row">
<div class="table-cell">2</div>
<div class="table-cell">John Smith</div>
<div class="table-cell">37</div>
</div>
</div>
</div>
So we can set the selector to:
- Table Selector: .table
- Header Row Selector: .table-header
- Header Cell Selector: .table-header-cell
- Data Row Selector: .table-row
- Data Cell Selector: .table-cell
After setting the proper selector in the advanced options as follows:
Click "Export All", you will then see the above sample table is correctly exported:
We can work on a smart way to extract data from such tables if we can get more support.
Related project
Also you can see this project for exporting tables using javascript (nodejs) directly.
Support and Contact
you can contact the developer via:
- twitter: Eric Tang
- email: [email protected]