Welcome to the A11yCrawler Adventure! This repo focuses on web scraping, crawling, and processing. It extracts links from web pages and sends them to the RabbitMQ for further processing within the EqualifyApp ecosystem.
https://github.com/EqualifyEverything/integration-crawler.git
This repo contains a scraper that extracts links from websites and sends these links to dedicated RabbitMQ queues, waiting for further processing. Links to be crawled are read from the launch_crawler RabbitMQ queue.
To start your A11yCrawler Adventure, jump right in by deploying the container! π
| Env Var | Default | Options | Notes |
|---|---|---|---|
| APP_PORT | 8083 | Any port number | Doesn't need to be exposed if not using api endpoint |
Get the standalone container from Docker Hub and unleash the power of the A11yCrawler!
docker pull equalifyapp/a11y-crawler-adventure:latest
docker run --name a11y-crawler-adventure -p 8086:8086 equalifyapp/a11y-crawler-adventure:latest
wget and curl, configures the working directory, and installs necessary packages from the requirements.txt file. Then, it defines the APP_PORT environment variable and sets a health check. Finally, it runs src/main.py to kick off the action.
launch_crawler queue and processes incoming messages using the check_queue function. It calls the process_message function found in src/utils/scrape.py.
catch_rabbits for consuming messages from the queue.
landing_crawler and error_crawler queues.
watch.py configures the logger for the entire adventure. It provides a unified logging system for A11yCrawler Adventure, keeping an eye on everything that happens behind the scenes!