Web Scrapers for Mac

View 23 business solutions
  • Easy-to-use online form builder for every business. Icon
    Easy-to-use online form builder for every business.

    Create online forms and publish them. Get an email for each response. Collect data.

    Easy-to-use online form builder for every business. Create online forms and publish them. Get an email for each response. Collect data. Design professional looking forms with JotForm Online Form Builder. Customize with advanced styling options to match your branding. Speed up and simplify your daily work by automating complex tasks with JotForm’s industry leading features. Securely and easily sell products. Collect subscription fees and donations. Being away from your computer shouldn’t stop you from getting the information you need. No matter where you work, JotForm Mobile Forms lets you collect data offline with powerful forms you can manage from your phone or tablet. Get the full power of JotForm at your fingertips. JotForm PDF Editor automatically turns collected form responses into professional, secure PDF documents that you can share with colleagues and customers. Easily generate custom PDF files online!
    Learn More
  • Quality and compliance software for growing life science companies Icon
    Quality and compliance software for growing life science companies

    Unite quality management, product lifecycle, and compliance intelligence to stay continuously audit-ready and accelerate market entry

    Automate gap analysis across FDA, ISO 13485, MDR, and 28+ regulatory standards. Cross-map evidence once, reuse across submissions. Get real-time risk alerts and board-ready dashboards, so you can expand into new markets with confidence
    Learn More
  • 1
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features include the fact that it is multi-thread and has distribution support. WebMagic is very easy to integrate. Add dependencies to your pom.xml. WebMagic use slf4j with slf4j-log4j12 implementation. If you customized your slf4j implementation, please exclude slf4j-log4j12. You can write a class implementation of PageProcessor.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    crawly

    crawly

    High-level web crawling and scraping framework for Elixir apps

    Crawly is a high-level application framework for crawling websites and extracting structured data using the Elixir programming language. It provides a complete environment for building web crawlers that systematically visit pages, collect information, and transform that data into structured formats for further processing. Crawly is designed for tasks such as data mining, information processing, and building historical archives of web content. Crawly follows the Elixir and OTP architecture model, enabling concurrent and fault-tolerant crawling processes that can handle many requests efficiently. Developers define specialized components called spiders to control how pages are visited and how information is extracted from them. It also supports extensibility through middlewares, pipelines, and fetchers that allow customization of request handling, data processing, and crawling behavior.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    CEF Python

    CEF Python

    Python bindings for the Chromium Embedded Framework (CEF)

    Python bindings for the Chromium Embedded Framework (CEF). CEF Python is an open source project founded by Czarek Tomczak in 2012 to provide Python bindings for the Chromium Embedded Framework (CEF). The Chromium project focuses mainly on Google Chrome application development while CEF focuses on facilitating embedded browser use cases in third-party applications. Lots of applications use CEF control, there are more than 100 million CEF instances installed around the world. There are numerous use cases for CEF. Use it as a modern HTML5 based rendering engine that can act as a replacement for classic desktop GUI frameworks. Think of it as Electron for Python. Embed a web browser widget in a classic Qt / GTK / wxPython desktop application. Use it for automated testing of web applications with more advanced capabilities than Selenium web browser automation due to CEF low level programming APIs.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Gecco

    Gecco

    Lightweight Java web crawler framework with jQuery-style extraction

    Gecco is a lightweight web crawler framework written in Java that simplifies the process of building web scraping applications. It is designed to make crawler development straightforward by allowing developers to extract page elements using jQuery-style selectors rather than complex parsing logic. It integrates several well-known Java libraries and frameworks, including tools for HTTP requests, HTML parsing, JSON processing, and application development. Through its annotation-based design, developers can define crawling rules and data extraction logic directly within Java classes, reducing boilerplate code and improving readability. Gecco also provides mechanisms for handling dynamic web content, including support for asynchronous requests and extraction of JavaScript variables from pages. Gecco emphasizes extensibility and follows an open design that allows additional components and integrations to be added without modifying the core codebase.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Point of Sale. Powerful and Simple. Icon
    Point of Sale. Powerful and Simple.

    For retail store owners and multi-location retail operations needing a tool to manage sales, inventory, staff and channels in one place

    Vibe Retail is an all-in-one retail point-of-sale and operations platform built for single-store and multi-location retailers seeking to unify inventory, sales, staff and customer data from one mobile-friendly interface. The system lets you track inventory across locations and warehouses, handle item variations (size, color, material), manage purchase orders and supplier deliveries, print custom barcodes, and transfer stock between stores in real time. On the sales side, Vibe supports multiple payment types (cards, cash, checks, gift cards, EBT), layaway workflows, serial number tracking, delivery management, loyalty programs and branded receipts. Retailers can integrate with online platforms (such as Shopify and WooCommerce), sync in-store and online sales, access 40+ real-time reports on sales, inventory and performance, set up promotions and discounts, and print receipts from mobile devices.
    Learn More
  • 5
    Ulixee Hero

    Ulixee Hero

    The web browser built for scraping

    It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise your script as practically any browser.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    ast-hook-forjs-re

    ast-hook-forjs-re

    AST-based JavaScript reverse engineering and variable tracing toolkit

    ast-hook-for-js-RE is an open source JavaScript reverse engineering toolkit designed to help analysts locate and understand client-side encryption logic used by web applications. It works by intercepting browser traffic through a local proxy server and modifying JavaScript code before it executes in the browser. Using Abstract Syntax Tree (AST) transformations, it injects hook functions into the code to monitor variable assignments and other runtime changes during execution. This allows ast-hook-for-js-RE to capture variable values in memory and store them in a searchable database, effectively enabling variable-level monitoring of program execution. When a user encounters encrypted parameters in network requests, the captured variable data can be searched to determine where those values originated in the code. Once the relevant variable and code location are identified, analysts can trace backward to extract or reproduce the encryption logic used by the site.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    crawler

    crawler

    Collection of JS reverse engineering examples for web scraping study

    crawler is a collection of web scraping and JavaScript reverse engineering examples designed for learning how modern websites protect their data and how those protections can be analyzed. It contains many case studies that demonstrate how to analyze and replicate request parameters, cookies, and encryption logic used by real websites. Each directory in the project focuses on a specific target service or scenario, showing how browser network requests and JavaScript code can be studied to reproduce API calls programmatically. Many examples illustrate techniques such as debugging scripts, intercepting requests, analyzing encrypted parameters, and understanding authentication flows. crawler also explores common anti-scraping defenses and demonstrates how developers can examine them through debugging tools and reverse engineering techniques.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    instagram-profilecrawl

    instagram-profilecrawl

    Instagram profile crawler that extracts posts, tags, and stats

    instagram-profilecrawl is a Python-based automation script designed to collect publicly available information from Instagram profiles. It crawls profile data such as follower counts, post information, hashtags, and other engagement-related metadata. It operates by automating a web browser using Selenium and performing requests to gather structured information from the platform. instagram-profilecrawl can analyze multiple usernames in a single run and store the extracted information locally in structured formats such as JSON. The collected data can include profile metadata, post details, engagement metrics, and commenter activity, allowing users to analyze account behavior or monitor profile growth over time. It also provides scripts for downloading images from crawled profiles and logging statistics into CSV files for tracking metrics like followers, likes, and comments. Authentication is optional, meaning the crawler can access public profile data without logging in.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    kimuraframework

    kimuraframework

    AI-first Ruby framework for building fast, flexible web scraping spide

    Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance. Kimurai supports scraping both static and JavaScript-rendered websites by working with multiple engines, including headless browsers and simple HTTP-based approaches. Developers can also interact with pages using browser automation features such as form filling, clicking elements, or navigating through dynamic content. It includes tools for scheduling, parallel scraping, and structured data output, making it suitable for building reliable large-scale crawlers.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Document generation engine that can be used to generate PDF and Word documents from custom software applications Icon
    Document generation engine that can be used to generate PDF and Word documents from custom software applications

    With Docmosis you can quickly and easily add document generation and reporting to your software application.

    Documents are generated from templates which can be created using Microsoft Word or LibreOffice. These templates utilize simple placeholder fields to handle text, repeating and conditional logic with flexible formatting features to control how data and images are injected into the documents.
    Free Trial
  • 10
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages. Once trained, the generated scraper can process new pages and return the extracted data in structured formats such as dictionaries or lists. This approach simplifies web scraping tasks by shifting the focus from rule-writing to example-based training. Internally, the project processes HTML documents, identifies relevant elements in the DOM, and builds extraction logic based on statistical or heuristic analysis of the training samples. The result is a developer-oriented tool that aims to automate common scraping workflows.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    pxer

    pxer

    Pixiv crawler userscript for downloading artwork and galleries easily

    Pxer is an open source tool designed to help users collect and download artwork from the Pixiv illustration platform. It is implemented primarily in client-side JavaScript and runs directly in the browser through a userscript environment, allowing it to integrate seamlessly with Pixiv pages. Pxer provides functionality to crawl and gather images, artwork metadata, and other related content from supported Pixiv pages. It is designed to be accessible even for users who are not developers, emphasizing ease of use and quick setup through browser extensions such as Tampermonkey. Once installed, Pxer adds controls to compatible Pixiv pages so users can trigger batch retrieval of illustrations and related assets. It focuses on making it easier to obtain large collections of artworks from artists, bookmarks, rankings, and search results. Its browser-based approach means the tool can operate without requiring a standalone desktop application while still providing powerful scraping.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    serverless-chrome

    serverless-chrome

    Run headless Chrome/Chromium on AWS Lambda

    Serverless Chrome contains everything you need to get started running headless Chrome on AWS Lambda (possibly Azure and GCP Functions soon). The aim of this project is to provide the scaffolding for using Headless Chrome during a serverless function invocation. Serverless Chrome takes care of building and bundling the Chrome binaries and making sure Chrome is running when your serverless function executes. In addition, this project also provides a few example services for common patterns (e.g. taking a screenshot of a page, printing to PDF, some scraping, etc.). Why? Because it's neat. It also opens up interesting possibilities for using the Chrome DevTools Protocol (and tools like Chromeless or Puppeteer) in serverless architectures and doing testing/CI, web-scraping, pre-rendering, etc. You must configure your AWS credentials either by defining AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environmental variables, or using an AWS profile.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    skycaiji

    skycaiji

    Open source web scraping system for automated data collection tasks

    SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets. SkyCaiji is designed to run on a variety of hosting environments including local machines, shared hosting environments, and cloud servers. It integrates with content management systems so collected data can be published automatically without manual intervention. SkyCaiji also supports automated workflows that continuously gather data and process it based on defined collection rules. Its architecture enables users to build scalable web scraping pipelines that can run unattended once configured.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    wombat

    wombat

    Lightweight Ruby DSL for scraping structured data from web pages

    Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured results. The DSL approach helps make scraping definitions more readable and maintainable, especially when dealing with multiple fields or nested data structures. Because it is implemented as a Ruby library, it integrates easily into Ruby applications and scripts that need to gather information from web pages. Wombat also includes examples and tests that demonstrate how scraping definitions can be written and executed within Ruby environments.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    CommunityScrapers

    CommunityScrapers

    This is a public repository containing scrapers

    Stash Community Scrapers is a large open-source collection of metadata extraction tools designed to work with the Stash media management platform, enabling automated scraping of content information from various online sources. The repository contains hundreds of scraper definitions written primarily in YAML and Python, each tailored to extract structured metadata such as titles, performers, tags, and media details from specific websites. These scrapers integrate directly into Stash, allowing users to enrich their media libraries with accurate and detailed information without manual entry. The project supports both automatic installation through in-app feeds and manual configuration for advanced use cases. Some scrapers require additional configuration such as API keys or cookies, highlighting its flexibility and adaptability to different sources.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    CyberScraper 2077

    CyberScraper 2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    RED HAWK

    RED HAWK

    All-in-one reconnaissance and vulnerability scanning toolkit for sites

    RED HAWK is an open source command-line security tool designed for information gathering, vulnerability scanning, and web reconnaissance tasks. It combines multiple scanning and analysis capabilities into a single toolkit to help security researchers and penetration testers quickly analyze a target website. It can collect a wide range of information about domains, servers, and web applications, including network details, hosting configuration, and content management system detection. It also provides vulnerability scanning features that help identify potential issues such as error-based SQL injection vulnerabilities and sensitive file exposure. RED HAWK includes utilities for performing DNS lookups, port scans, subdomain discovery, and reverse IP analysis, giving users a comprehensive view of a target environment. In addition to vulnerability detection, RED HAWK offers crawling features that gather links and metadata from websites to support deeper reconnaissance.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Save For Offline

    Save For Offline

    Android app for saving webpages for offline reading

    Android app for saving webpages for offline reading. Save For Offline is an Android app for saving full web pages for offline reading, with lots of features and options. In you web browser selects 'Share', and then 'Save For Offline'. Saves real HTML files which can be opened in other apps/devices. Download & save entire web pages with all assets for offline reading & viewing. Save HTML files in a custom directory. Save in the background, no need to wait for it to finish saving. Night mode, with both a dark theme, and can invert colors when viewing pages (White becomes black and vice-versa). A search of saved pages (By title only for now). User-agent change allows the saving of either desktop or mobile versions of pages. Nice UI for both phones and tablets, with various choices for layout and appearance.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version 0.3 changed the requests serialization from marshal to cPickle, therefore persisted requests using version 0.2 will not able to work on 0.3. The class scrapy_redis.spiders.RedisSpider enables a spider to read the urls from redis. The urls in the redis queue will be processed one after another, if the first request yields more requests, the spider will process those requests before fetching another url from redis.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Scweet

    Scweet

    Scrape tweets, profiles, followers and following from Twitter/X

    Scweet is a Python-based Twitter/X scraping library and CLI designed to collect tweets, profile timelines, followers, following lists, and user profile data without requiring the official Twitter/X API or a developer account. Instead of depending on deprecated unauthenticated scraping methods, it works by using X’s web GraphQL API together with authenticated browser cookies, which gives it a more current and practical approach for data extraction. The project supports a broad set of collection patterns, including searches by keyword, hashtag, user, date range, engagement thresholds, language, and location, making it useful for research, monitoring, and data gathering workflows. It is built for both local use and higher-volume runs, with support for proxies, dedicated accounts, and multi-account cookie handling to improve reliability at scale. Scweet also includes asynchronous method variants, a command-line interface, automatic credential persistence in a local database, etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Weibo Crawler

    Weibo Crawler

    Python crawler for collecting and downloading Sina Weibo user data

    weibo-crawler is a Python-based data collection tool designed to retrieve information from Sina Weibo user accounts. It automates the process of gathering posts, user profile details, and engagement metrics from one or more target accounts. weibo-crawler can extract comprehensive information about users, including profile attributes such as nickname, follower count, following count, and account metadata. It also captures detailed data about each post, including the content, publishing time, topics, mentions, likes, reposts, and comments. In addition to textual data, the project can download original media from posts, such as images, videos, and Live Photo content. Collected data can be exported to structured formats such as CSV or JSON or stored in databases for further analysis and research. It supports incremental crawling so users can periodically collect only newly published posts, making it useful for ongoing monitoring or dataset updates.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    python-fxxk-spider

    python-fxxk-spider

    Collection of 100+ Python web scraping projects and crawler examples

    python-fxxk-spider is a curated collection of Python web scraping and crawler projects gathered in a single repository for reference and learning. It aggregates many independent scraping examples that target a wide range of websites, online services, and public data sources. Instead of being a single crawler tool, it functions as a catalog of ready-made Python spider implementations that demonstrate different scraping techniques. python-fxxk-spider includes scrapers for social media, e-commerce platforms, job listings, music services, video platforms, and various content sites. Because websites frequently change their structure, some included projects may require adjustments before they can run successfully. It is designed as a long-term, continuously updated list of practical crawler implementations that developers can study, modify, and adapt to their own scraping tasks. It also highlights the importance of legal and responsible use of web scraping technologies.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages. Several scripts also incorporate multi-threading and proxy usage to improve scraping efficiency and help avoid common anti-scraping limitations. In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    tumblr-crawler

    tumblr-crawler

    Python crawler to download photos and videos from Tumblr blogs

    tumblr-crawler is an open source Python-based utility designed to download media content from Tumblr blogs. It provides a script that automatically retrieves photos and videos from specified Tumblr sites and saves them locally for offline access. Users can specify one or multiple blogs to crawl by editing a configuration file or by passing parameters through the command line. Once executed, the script fetches media from the Tumblr API and stores the downloaded files in folders named after each blog. tumblr-crawler avoids re-downloading files that have already been saved, making repeated runs safe and useful for recovering missing media. It also supports optional proxy configuration, which can help when access to Tumblr content requires routing requests through a proxy server. With simple dependencies and straightforward configuration, the project offers a practical way to archive media content from Tumblr blogs.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    xsrfprobe

    xsrfprobe

    Advanced toolkit for detecting and exploiting CSRF vulnerabilities

    XSRFProbe is an advanced security auditing toolkit designed to detect and analyze Cross Site Request Forgery (CSRF/XSRF) vulnerabilities in web applications. It uses an automated crawling engine that continuously scans a target application, collects forms and endpoints, and evaluates them for potential CSRF weaknesses. XSRFProbe performs numerous systematic checks to determine whether a web endpoint is vulnerable, including inspection of anti-CSRF tokens, cookie validation behavior, and request forgery scenarios. It also analyzes the strength and randomness of security tokens using algorithms such as entropy calculations to determine whether tokens can be predicted or forged. When a vulnerability is discovered, it can automatically generate proof-of-concept payloads that demonstrate how the flaw could be exploited in a real attack scenario. XSRFProbe provides a highly automated workflow while still allowing users to customize scanning behavior and configuration settings.
    Downloads: 2 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB