Diffbot: An Artificial Intelligence web crawling platform
Diffbot is an AI startup that provides knowledge as a service to power intelligent applications. It is aiming to convert the existing web into the world’s largest database of structured data and offers a set of APIs that enable developers to easily use web data in their own applications.
Why do you see Diffbot in the SEO tools stash? Because professional SEO, digital marketing, SMM and growth hacking, are heavily (and increasingly) based on data. Diffbot is not a SEO platform, but it is one of the best modern platforms for web scraping, data mining and data extraction. So I’m pretty sure you shoud know about it.
Diffbot’s products:
- Crawly – a mini-app by Diffbot
- Crawlbot – smart spidering and bulk processing
- Automatic APIs – automatic data extraction based on AI technology
- Custom APIs – correct APIs and create your own custom APIs
Diffbot features (including technical ones):
- Content Extractor works without rules or training – the best way to extract data from web pages
- Identify pages automatically: analyze API automatically finds and extracts all products, articles, discussions, videos or images while crawling any site
- Detailed product data – The Product API automatically returns complete product info, including all pricing data, product IDs, brand and full specifications tables
- Clean text and HTML – articles, discussion threads, product descriptions and image captions are returned in pure text and sanitized HTML
- Structured Search – search structured content from any crawl on-the-fly using our Search API, returning only the matching results
- All APIs execute Javascript so content is parsed like a regular browser
- Works on most non-English pages thanks to visual processing
- Multipage articles are automatically joined together in a API response