There are a lot of data out there. Whenever you traverse the Internet you encounter countless and vast databases presenting their content in the form of web pages. It would be so nice to grab some of these resources and use them in your research. Unfortunately, owners of these databases are not as eager to share them with you as you would like them to be. So, you can ask your research assistant to sit down and copy the data by hand into an Excel spreadsheet. But wait... there are 135,622 records and each has seven variables. Assuming copying each variable takes your RA five seconds and opening a new web page takes 15 seconds, your RA will work five years and two months without eating and sleeping before (s)he gets everything. And what if (s)he makes a mistake in the 59,453rd record? Regardless, by the time data collection is finished, your RA should probably have graduated and gracefully disappeared. What else can be done? Perhaps if there were a machine that could complete the dull, repetitive, and time-consuming task of collecting entries from the Internet...

Meet me. I create such machines. So far, I've written programs to obtain data from auction websites, patent databases, and online retailers. I even invented a name for this: Data Threshing.

Data Threshing

Below, you can obtain a rough estimate of how much is it going to cost you.

Number of web pages to download:
Number of tables in the dataset:
Total number of variables:
I am going to like Data Threshing on Facebook.
I am not going to like Data Threshing on Facebook.

Your estimate is:

This estimate is not binding. Your price may be different, especially if the website from which you want to obtain data is crawler unfriendly, is not in plain HTML, or if you need scheduled data downloads, for example at a particular time every day. To get a precise and binding quote, send me an email with a sample of what you need and the Internet address of the data source. The sample should contain all the tables, and each table should contain at least three rows of data. Make sure that you include all the variables you need, because changing your mind after the process has started will cost you $50 for each new variable. You should also include information on what entries you are interested in (e.g., all sales between March 1997 and August 2006 in Garden & Kitchen category).

The whole process takes usually two to four weeks unless the number of files to download exceeds 100,000. The data will be stored in CSV format. You may request converting your data to any other format for an additional fee.

Feel free to contact me if you have any questions:
mobile phone+1.352.254.2490

You can also visit Data Threshing on Facebook.