Scrapy: An open source collaborative framework for easy and fast web data extraction
Scrapy is a data mining and web scraping framework for Python – one of the most popular and advanced ones. Originally designed for web scraping, it can also be used to extract data using APIs or as a general purpose web crawler. Scrapy is maintained by Scrapinghub, a web scraping development and services company (it’s other tools are Portia, Crawlera, Scrapy Cloud and Splash).
Scrapy features:
- Build and run your own web spiders
- Deploy them to the Scrapy Cloud
- Fast and powerful write the rules to extract the data and let Scrapy do the rest
- Easily extensible by design, plug new functionality easily without touching the core
- Portable, Python – written in Python and runs on Linux, Windows, Mac and BSD
- Healthy community on Github, Twitter, StackOverflow, mailing list
Scrapy use cases:
- Build your own perfect scalable SEO crawler
- Scrape keyword positions
- Spy on competitors
- Perform automated SEO based on web scraping
- Create specific tool for your unique SEO needs
- Replace expensive SEO platforms with personally adjusted scraping tools
- Scrape technical SEO info
- Visualize internal and out links
- Build a DMOZ spider
- Scrape websites from public web directories