Data Crawlers & Spiders

One of our core competencies is extracting data from any part of the web (and other data sources); we have written large custom distributed crawlers to fetch voluminous amounts of data in both WindowsTM and Linux platforms. Over time, we have developed our own custom libraries for such distributed web crawling and parsing and can crawl data from complex websites with a short turnaround time.

Projects:

Custom Crawlers

Custom crawlers, based on client requirements, to extract data from various websites and store them in databases.

We have developed a plethora of crawlers to obtain data from a diverse range of sites. We first proceed to understand the site, the kind of access restrictions, the data to extract, the time estimates and the database design required to store the data. We then execute, with shortest possible turnaround time, and use our custom-built libraries to crawl in a distributed fashion the target site(s). Overall, we have crawled, for dozens of clients, publically available data from websites (like government websites), aggregator sites, competitor websites, relevant statistics/surveys/reports/docs made available on sites behind forms or other restrictions.

tc


tc

Patent Data Crawler and Analyser

This crawler was written for crawling and parsing publically available patent data from USPTO and WIPO sites. Developing these kinds of crawlers require sophisticated understanding of the websites and their data design.