Showing 221 open source projects for "web crawler"

View related business solutions
  • No-code automation to improve your process workflows Icon
    No-code automation to improve your process workflows

    Pipefy is a digital automation software that centralizes data and standardizes workflows for teams like Finance and HR

    Transform your financial and HR operations and improve efficiency even remotely with digital, customized workflows that your team can automate and integrate with other software without the need of IT development.
    Try For Free
  • Haystack is a modern, engaging, and intuitive intranet platform that employees actually use. Icon
    Haystack is a modern, engaging, and intuitive intranet platform that employees actually use.

    You Deserve the Best Intranet Experience

    With customizable iOS and Android mobile apps, Slack and Microsoft Teams integrations, and an intuitive design employees love, Haystack brings an outstanding digital employee experience to your entire workforce, no matter where their work takes them.
    Learn More
  • 1
    crawlergo

    crawlergo

    Headless Chrome crawler for collecting URLs for vulnerability scans

    crawlergo is a browser-based web crawler designed to collect URLs and request data that can be used by web vulnerability scanning tools. It uses a Chrome headless environment to render web pages and observe behavior during the DOM rendering stage in order to capture as many accessible endpoints as possible. By monitoring the page lifecycle and interacting with web elements, the crawler automatically triggers JavaScript events and navigational actions that would normally occur during real user interaction. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    grab-site

    grab-site

    Web crawler for archiving and backing up sites into WARC archives

    grab-site is an open source web crawling tool designed to archive and back up websites by recursively downloading their content. It works by taking a starting URL and systematically following links across the site, capturing pages and resources and saving them into WARC archive files for long-term preservation. Internally, the crawler uses a fork of the wpull engine to fetch and process web pages efficiently during large-scale crawls. grab-site includes a built-in dashboard that displays real-time crawl activity, including which URLs are currently being processed and how many remain in the queue. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    DecryptLogin

    DecryptLogin

    Python library providing APIs for automated website login workflows

    DecryptLogin is a Python library designed to simplify automated login processes for many popular websites by providing ready-to-use APIs that simulate authentication behavior. It focuses on implementing login mechanisms through HTTP requests, allowing developers to programmatically authenticate with supported services without manually replicating complex login flows. It includes modules that handle different authentication modes such as PC login, mobile login, and QR code login depending on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Hakrawler

    Hakrawler

    Fast Go web crawler for discovering URLs and web app endpoints

    hakrawler is a lightweight command-line web crawler built in Go that is designed to quickly discover URLs, endpoints, and assets within web applications. It is primarily used during the reconnaissance phase of security testing, bug bounty hunting, and penetration testing. It works by automatically crawling web pages and extracting links, JavaScript file locations, and other resources that may reveal additional attack surface or hidden functionality. hakrawler is implemented as a simple and efficient crawler using the Gocolly library, which allows it to perform fast and concurrent crawling of web pages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stigg | SaaS Monetization and Entitlements API Icon
    Stigg | SaaS Monetization and Entitlements API

    For developers in need of a tool to launch pricing plans faster and build better buying experiences

    A monetization platform is a standalone middleware that sits between your application and your business applications, as part of the modern enterprise billing stack. Stigg unifies all the APIs and abstractions billing and platform engineers had to build and maintain in-house otherwise. Acting as your centralized source of truth, with a highly scalable and flexible entitlements management, rolling out any pricing and packaging change is now a self-service, risk-free, exercise.
    Learn More
  • 5
    pspider

    pspider

    Simple Python framework for building multithreaded web crawlers

    PSpider is a lightweight web crawling framework written in Python designed to simplify the development of custom web spiders. It focuses on providing an easy-to-understand architecture while still supporting concurrent crawling for improved performance. It uses a multithreaded model that separates the crawling workflow into several components responsible for fetching, parsing, and saving data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    instagram-profilecrawl

    instagram-profilecrawl

    Instagram profile crawler that extracts posts, tags, and stats

    ...It also provides scripts for downloading images from crawled profiles and logging statistics into CSV files for tracking metrics like followers, likes, and comments. Authentication is optional, meaning the crawler can access public profile data without logging in.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    appcrawler

    appcrawler

    Automated mobile app crawler and testing tool built on Appium

    AppCrawler is an automated mobile application testing tool designed to explore and interact with app user interfaces automatically. Built on top of the Appium automation framework, it systematically crawls through application screens and performs actions such as clicking buttons, navigating menus, and interacting with UI elements to simulate user behavior. It is commonly used for automated functional testing, UI exploration, and detecting crashes or unexpected behaviors in mobile...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
     Abdal Web Traffic Generator

    Abdal Web Traffic Generator

    create useful statistics and traffic on your site

    This tool will have the ability to create useful statistics and traffic on your site and actually help rank your statistics on sites like Alexa and so on.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    ReconSpider

    ReconSpider

    Most Advanced Open Source Intelligence (OSINT) Framework

    ...Reconnaissance is a mission to obtain information by various detection methods, about the activities and resources of an enemy or potential enemy, or geographic characteristics of a particular area. A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).
    Downloads: 8 This Week
    Last Update:
    See Project
  • EasySend is a no-code platform that transforms customer journeys Icon
    EasySend is a no-code platform that transforms customer journeys

    Defy form limits. 
Create digital experiences.

    Evolve forms into smart, AI-powered digital workflows that streamline your data intake and elevate customer experiences.
    Learn More
  • 10
    Abot

    Abot

    Fast and flexible C# framework for building customizable web crawlers

    Abot is an open source C# web crawler framework designed to help developers efficiently crawl and process web content. It focuses on speed, flexibility, and extensibility while handling the complex low-level tasks involved in web crawling. It manages essential components such as multithreading, HTTP requests, scheduling, and link parsing so developers can focus on processing the collected data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    gocrawl

    gocrawl

    Polite concurrent web crawler library for Go with flexible hooks

    gocrawl is a lightweight web crawling library written in the Go programming language that enables developers to build custom web crawlers and data extraction tools. gocrawl focuses on providing a minimal yet powerful crawling engine that can be easily extended and adapted for different web scraping or indexing tasks. It is designed to be polite when accessing websites by respecting crawling rules such as robots.txt policies and applying crawl delays for each host. It executes requests...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    lxspider

    lxspider

    Educational Python web scraping case collection for many sites

    lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms, social media services, content sites, research databases, and information portals. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    CEF Python

    CEF Python

    Python bindings for the Chromium Embedded Framework (CEF)

    Python bindings for the Chromium Embedded Framework (CEF). CEF Python is an open source project founded by Czarek Tomczak in 2012 to provide Python bindings for the Chromium Embedded Framework (CEF). The Chromium project focuses mainly on Google Chrome application development while CEF focuses on facilitating embedded browser use cases in third-party applications. Lots of applications use CEF control, there are more than 100 million CEF instances installed around the world. There are...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14

    Mowglee

    Mowglee - The Geo Crawler!

    Mowglee is a distributed, multi-threaded, asynchronous task execution based web crawler in Java.It is designed for geographic affinity and is highly modular.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    NY Times

    NY Times

    A Simple Demonstration of the New York Times App

    NY Times is a Minimal News 🗞 Android application built to describe the use of JSoup with Modern Android development tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    proxypool

    proxypool

    Proxy crawler that aggregates, tests, and serves usable proxy nodes

    ...The behavior of the crawler and the sources it scans can be configured through configuration files, enabling users to customize how nodes are gathered and maintained. It also supports scheduled crawling to continuously update the proxy list and keep the pool current with newly discovered nodes.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 17
    RED HAWK

    RED HAWK

    All-in-one reconnaissance and vulnerability scanning toolkit for sites

    RED HAWK is an open source command-line security tool designed for information gathering, vulnerability scanning, and web reconnaissance tasks. It combines multiple scanning and analysis capabilities into a single toolkit to help security researchers and penetration testers quickly analyze a target website. It can collect a wide range of information about domains, servers, and web applications, including network details, hosting configuration, and content management system detection. It also...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18

    PHP mini vulnerability suite

    Multiple server/webapp vulnerability scanner

    github: https://github.com/samedog/phpmvs
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    magnetW

    magnetW

    Magnet link aggregation search

    magnetW is based on the rule principle of magnetX , the search results of each magnetic station are uniformly formatted. There is no group in this project, only Github for code hosting and related technical exchanges, and other addresses may be risky, please distinguish carefully. This project is open source and free. There are no collection channels of any kind, such as donations, and no advertising of any kind. If you encounter anything similar to the above situation, please don't believe...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    WebSploit Framework

    WebSploit Framework

    WebSploit is a high level MITM Framework

    WebSploit Advanced MITM Framework [+]Autopwn - Used From Metasploit For Scan and Exploit Target Service [+]wmap - Scan,Crawler Target Used From Metasploit wmap plugin [+]format infector - inject reverse & bind payload into file format [+]phpmyadmin Scanner [+]CloudFlare resolver [+]LFI Bypasser [+]Apache Users Scanner [+]Dir Bruter [+]admin finder [+]MLITM Attack - Man Left In The Middle, XSS Phishing Attacks [+]MITM - Man In The Middle Attack [+]Java Applet Attack [+]MFOD Attack Vector [+]ARP Dos Attack [+]Web Killer Attack [+]Fake Update Attack [+]Fake Access point Attack [+]Wifi Honeypot [+]Wifi Jammer [+]Wifi Dos [+]Wifi Mass De-Authentication Attack [+]Bluetooth POD Attack Project In Github : https://github.com/websploit
    Downloads: 9 This Week
    Last Update:
    See Project
  • 21
    BotSlayer

    BotSlayer

    BotSlayer Community Edition

    BotSlayer is an application that helps track and detect potential manipulation of information spreading on Twitter. The tool is developed by the Observatory on Social Media at Indiana University --- the same lab that brought to you Botometer and Hoaxy. BotSlayer is not a tool to detect and remove likely social bots from your list of Twitter followers or friends. For that purpose, check out Botometer. If you just want to visualize the spread of some piece of information, consider Hoaxy....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ECommerceCrawlers

    ECommerceCrawlers

    Collection of Python ecommerce and website crawler examples projects

    ECommerceCrawlers is a collection of practical Python web crawler projects designed to gather data from a variety of ecommerce platforms, websites, and online services. It aggregates many independent crawler examples created by contributors and organized into separate subprojects that target specific sites or data sources. These examples demonstrate how to build and operate web scrapers capable of collecting structured information such as product listings, news content, job postings, social media data, and other publicly available web data. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 23
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Photon

    Photon

    Incredibly fast crawler designed for OSINT

    Photon is an extremely fast web crawler built specifically for OSINT and reconnaissance use cases. It is designed to extract URLs, endpoints, files, and other intelligence artifacts from target websites with minimal overhead. The crawler prioritizes speed and breadth, making it suitable for mapping web attack surfaces and discovering hidden resources. Photon is commonly used during early reconnaissance phases to build a comprehensive inventory of reachable assets. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    ShadowSocksShare

    ShadowSocksShare

    Python ShadowSocks framework

    This project obtains the shared ss(r) account from the ss(r) shared website crawler, redistributes the account and generates a subscription link by parsing and verifying the account connectivity. Since Google plus will be closed on April 2, 2019, almost all the available accounts crawled before come from Google plus. So if you are building your own website, please keep an eye on the updates of this project and redeploy using the latest source code.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB