git:/git.code.sf.net/p/docfetcher/code free download

EasySpider

A visual no-code/code-free web crawler/spider

A visual code-free/no-code web crawler/spider, supporting both Chinese and English.

Downloads: 5 This Week

Last Update: 2025-01-01

See Project

miniblink49

Lighter, faster browser kernel of blink to integrate HTML UI in apps

miniblink is an open source, one file, small browser widget based on chromium. By using C interface, you can create a browser with just some line code. miniblink is an open source, single-file, and currently the smallest known chromium-based browser control. Through its exported pure C interface, a browser control can be created in a few lines of code. C++, C#, Delphi and other language calls (support C++, C#, Delphi language to call). Embedded Nodejs, support electron (with Nodejs, can run electron). ...

Downloads: 11 This Week

Last Update: 2025-12-13

See Project

Python API for JMComic

Python crawler and API for downloading JMComic albums and images

...It includes command-line functionality and configuration files so users can customize download behavior, directory structures, and performance settings without modifying code. It also supports plugin-based extensions that allow additional processing.

Downloads: 1 This Week

Last Update: 2026-04-07

See Project

Maxun

Small event-delegation library for decoupling event binding and handli

Maxun named JsAction by Google serves as a lightweight event delegation library built in JavaScript. It allows developers to separate the logic of binding events from the code that handles those events, helping to keep DOM event wiring cleaner and more maintainable. It is archived and marked as read-only, indicating that the project is no longer actively maintained or intended for production use. The README states that ongoing development has migrated into a larger framework under the Angular project. It includes modules for dispatching events, for capturing native events, for custom event details, and for action flows. ...

Downloads: 18 This Week

Last Update: 2026-03-10

See Project

crawler

Collection of JS reverse engineering examples for web scraping study

...It contains many case studies that demonstrate how to analyze and replicate request parameters, cookies, and encryption logic used by real websites. Each directory in the project focuses on a specific target service or scenario, showing how browser network requests and JavaScript code can be studied to reproduce API calls programmatically. Many examples illustrate techniques such as debugging scripts, intercepting requests, analyzing encrypted parameters, and understanding authentication flows. crawler also explores common anti-scraping defenses and demonstrates how developers can examine them through debugging tools and reverse engineering techniques.

Downloads: 2 This Week

Last Update: 3 days ago

See Project

kimuraframework

AI-first Ruby framework for building fast, flexible web scraping spide

Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance. Kimurai supports scraping both static and JavaScript-rendered websites by working with multiple engines, including headless browsers and simple HTTP-based approaches. ...

Downloads: 3 This Week

Last Update: 6 days ago

See Project

skycaiji

Open source web scraping system for automated data collection tasks

SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets. SkyCaiji is designed to run on a variety of hosting environments including local machines, shared hosting environments, and cloud servers. It integrates with content management systems so collected data can be published automatically without manual intervention. ...

Downloads: 2 This Week

Last Update: 20 hours ago

See Project

Scrapling

An adaptive Web Scraping framework

Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling,...

Downloads: 3 This Week

Last Update: 3 days ago

See Project

crawlee

A web scraping and browser automation library for Node.js

...Meet our community on Discord. We believe websites are best scraped in the language they're written in. Crawlee runs on Node.js and it's built in TypeScript to improve code completion in your IDE, even if you don't use TypeScript yourself.

Downloads: 1 This Week

Last Update: 2026-02-06

See Project

rebroswer-patches

Patches for Puppeteer and Playwright to reduce automation detection

rebrowser-patches is an open source collection of patches designed to improve the stealth capabilities of browser automation frameworks. It focuses primarily on enhancing Puppeteer and Playwright by modifying parts of their source code that may reveal automation activity to websites. Many modern websites rely on bot detection mechanisms that identify automation through behavioral or technical signals, and these patches aim to reduce those detection vectors. By applying targeted fixes, the project helps developers minimize automation leaks that are difficult or impossible to address through configuration options alone. ...

Downloads: 0 This Week

Last Update: 2026-03-11

See Project

wombat

Lightweight Ruby DSL for scraping structured data from web pages

Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured results. The DSL approach helps make scraping definitions more readable and maintainable, especially when dealing with multiple fields or nested data structures. Because it is implemented as a Ruby library, it integrates easily into Ruby applications and scripts that need to gather information from web pages. ...

Downloads: 0 This Week

Last Update: 2026-04-07

See Project

newspaper4k

Python library for scraping and analyzing online news articles easily

...Newspaper4k supports both single-article extraction and full news site processing, allowing users to build sources representing entire publications and iterate through their articles. It maintains compatibility with the original project so that existing code written for newspaper3k can continue working with minimal changes.

Downloads: 0 This Week

Last Update: 2026-03-11

See Project

twitch-batch-downloader

Automate the download of entire Twitch.tv channels

...Save each Twitch video into its own folder, with date and time values, video ID, stream metadata, frame screenshot, .ts parts list and sha256 hash. Keep the original ts files and generate mp4 files from them. It requires a shell and some command line utilities. See README.md for details in the Code/git section.

Downloads: 6 This Week

Last Update: 1 day ago

See Project

dude uncomplicated data extraction

dude uncomplicated data extraction: A simple framework

Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.

Downloads: 0 This Week

Last Update: 2024-03-02

See Project

Gerapy

Distributed Crawler Management Framework Based on Scrapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t...

Downloads: 0 This Week

Last Update: 2023-07-19

See Project

Easyspider - Distributed Web Crawler

Easy Spider is a distributed Perl Web Crawler Project from 2006

Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing Software: https://www.artikelschreiber.com/en/ https://www.unaique.net/en/ https://www.unaique.com/ https://www.artikelschreiben.com/ https://www.buzzerstar.com/ https://easyperlspider.sourceforge.io/ https://www.sebastianenger.com/ https://www.artikelschreiber.com/opensource/ It is fun to look at some code that is few years ago and to see how one has improved himself. ...

1 Review

Downloads: 0 This Week

Last Update: 2025-03-16

See Project

Scrapyd

A service daemon to run Scrapy spiders

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders. A common (and useful) convention to use for the version name is the revision number of the version control tool you’re using to track your Scrapy project code. For example: r23. The versions are not compared alphabetically but using a smarter algorithm (the same packaging uses) so r10 compares greater to r9, for example. Scrapyd is an application (typically run as a daemon) that listens to requests for spiders to run and spawns a process for each one. Scrapyd also runs multiple processes in parallel, allocating them in a fixed number of slots given by the max_proc and max_proc_per_cpu options, starting as many processes as possible to handle the load.

Downloads: 0 This Week

Last Update: 2023-04-11

See Project

DecryptLogin

Python library providing APIs for automated website login workflows

...It focuses on implementing login mechanisms through HTTP requests, allowing developers to programmatically authenticate with supported services without manually replicating complex login flows. It includes modules that handle different authentication modes such as PC login, mobile login, and QR code login depending on what the target platform supports. DecryptLogin supports a wide variety of online services and platforms, including social media sites, developer platforms, cloud services, and other web portals. Developers can integrate these login routines into automation scripts, crawlers, or data collection tools that require authenticated sessions. ...

Downloads: 0 This Week

Last Update: 4 days ago

See Project

pspider

Simple Python framework for building multithreaded web crawlers

PSpider is a lightweight web crawling framework written in Python designed to simplify the development of custom web spiders. It focuses on providing an easy-to-understand architecture while still supporting concurrent crawling for improved performance. It uses a multithreaded model that separates the crawling workflow into several components responsible for fetching, parsing, and saving data. Tasks are managed through queues, allowing different parts of the crawler to process work...

Downloads: 1 This Week

Last Update: 4 days ago

See Project

ast-hook-forjs-re

AST-based JavaScript reverse engineering and variable tracing toolkit

...When a user encounters encrypted parameters in network requests, the captured variable data can be searched to determine where those values originated in the code. Once the relevant variable and code location are identified, analysts can trace backward to extract or reproduce the encryption logic used by the site.

Downloads: 2 This Week

Last Update: 2026-03-11

See Project

lxspider

Educational Python web scraping case collection for many sites

lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms,...

Downloads: 0 This Week

Last Update: 2026-03-11

See Project

ruia

Async Python framework for fast and flexible web scraping spiders

...Ruia is powered by Python’s asyncio library along with aiohttp, enabling developers to perform concurrent network requests efficiently and scrape data from websites with minimal overhead. Ruia follows a “write less, run faster” philosophy, emphasizing concise code and streamlined spider development. It provides a structured approach to building scraping projects through components such as data items, spiders, middleware, and plugins. Developers can define structured fields to extract information from HTML content and process responses asynchronously to improve crawling performance. It also supports middleware and plugin systems that allow customization of request handling, response processing, and additional functionality.

Downloads: 1 This Week

Last Update: 2026-03-11

See Project

GitGet

Ever wanted to download only a part of a Git repository.

Ever wanted to download only a part of a Git repository. Just paste the URL of the repo you want to download and sit back and enjoy. This simple java application makes use of Web Scraping and downloads only those files you need, thus helping you save your precious bandwidth and space.

1 Review

Downloads: 0 This Week

Last Update: 2018-09-03

See Project

jd-autobuy

Python tool that automates JD.com login and product purchase tasks

...It uses web scraping and HTTP request techniques to log into an account, check product availability, and attempt to purchase specified items automatically. It supports login through methods such as QR code authentication, allowing users to sign in through the platform’s mobile application. Once authenticated, the script can retrieve product details including price, stock status, and item information. It can automatically add items to the shopping cart and prepare an order submission workflow for faster purchasing during high-demand sales or limited stock releases. ...

Downloads: 1 This Week

Last Update: 4 days ago

See Project

Gecco

Lightweight Java web crawler framework with jQuery-style extraction

...It integrates several well-known Java libraries and frameworks, including tools for HTTP requests, HTML parsing, JSON processing, and application development. Through its annotation-based design, developers can define crawling rules and data extraction logic directly within Java classes, reducing boilerplate code and improving readability. Gecco also provides mechanisms for handling dynamic web content, including support for asynchronous requests and extraction of JavaScript variables from pages. Gecco emphasizes extensibility and follows an open design that allows additional components and integrations to be added without modifying the core codebase.

Downloads: 0 This Week

Last Update: 2026-03-10

See Project

Search Results for "git:/git.code.sf.net/p/docfetcher/code"

Showing 31 open source projects for "git:/git.code.sf.net/p/docfetcher/code"

EasySpider

miniblink49

Python API for JMComic

Maxun

crawler

kimuraframework

skycaiji

Scrapling

crawlee

rebroswer-patches

wombat

newspaper4k

twitch-batch-downloader

dude uncomplicated data extraction

Gerapy

Easyspider - Distributed Web Crawler

Scrapyd

DecryptLogin

pspider

ast-hook-forjs-re

lxspider

ruia

GitGet

jd-autobuy

Gecco

Search Results for "git:/git.code.sf.net/p/docfetcher/code"

Showing 31 open source projects for "git:/git.code.sf.net/p/docfetcher/code"

EasySpider

miniblink49

Python API for JMComic

Maxun

crawler

kimuraframework

skycaiji

Scrapling

crawlee

rebroswer-patches

wombat

newspaper4k

twitch-batch-downloader

dude uncomplicated data extraction

Gerapy

Easyspider - Distributed Web Crawler

Scrapyd

DecryptLogin

pspider

ast-hook-forjs-re

lxspider

ruia

GitGet

jd-autobuy

Gecco

Related Searches

Related Categories