Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Internet
Web Scrapers
Search Results

Search Results for "sandbox:/mnt/data/project_plan.pod"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 101
Windows 94
Mac 90
More...
BSD 57
ChromeOS 54

Category

Internet 101
Formats and Protocols 10
Software Development 9
Business 4
System 3
Communications 2
Scientific/Engineering 2
Desktop Environment 1
Productivity 1
Security 1
Terminals 1

License

OSI-Approved Open Source 85
Other License 5

Translations

English 7
German 2

Programming Language

Python 47
Java 16
JavaScript 13
PHP 11
More...
Go 7
TypeScript 5
Unix Shell 5
C# 3
C 2
C++ 2
Perl 2
PowerShell 2
Ruby 2
Elixir 1
PL/SQL 1
R 1
Rust 1
Visual Basic .NET 1
XSL (XSLT/XPath/XSL-FO) 1

Status

Beta 9
Production/Stable 8
Pre-Alpha 3
Planning 1
More...
Alpha 1

Showing 101 open source projects for "sandbox:/mnt/data/project_plan.pod"

View related business solutions

Web Scrapers Linux Clear Filters & Widen Search

Supercharge Your Manufacturing with Easy MRP and MES Software
Designed for SME manufacturers who want to reduce wasteful manual processing, save time and increase profits.

Flowlens eliminates stock-outs, shortage and overstocks, avoiding costly production delays. Stay in control of inventory levels and keep production running smoothly with real-time visibility and easy-to-use stock management. Import bulk data with ease.

Learn More
Eurekos LMS - Build a Smarter Customer
The Eurekos customer training LMS makes it easy to deliver product training that retains more customers and transforms partners into advocates.

Eurekos is a purpose-built LMS that engages customers throughout the entire learning journey from pre-sales, to onboarding, and everything after.

Learn More
1

Firecrawl

Turn entire websites into LLM-ready markdown or structured data

Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling, and data extraction capabilities. Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap is required.

Downloads: 11 This Week

Last Update: 2026-04-10
See Project
2

Scrapy

A fast, high-level web crawling and web scraping framework

...It can be used for data mining, monitoring and automated testing.

Downloads: 14 This Week

Last Update: 2026-04-09
See Project
3

skycaiji

Open source web scraping system for automated data collection tasks

SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets.

Downloads: 3 This Week

Last Update: 2 days ago
See Project
4

douyin

Open source Douyin crawler for collecting and downloading public data

DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages.

Downloads: 4 This Week

Last Update: 2026-03-13
See Project
Securely stream and govern industrial data to power intelligent operations with agentic insights.
For IoT Developers, Solution Architects, Technical Architects, CTOs, OT/IT Engineers

Trusted MQTT Platform — Fully-managed and cloud-native MQTT platform for bi-directional IoT data movement.

Learn More
5

kimuraframework

AI-first Ruby framework for building fast, flexible web scraping spide

Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance. ...

Downloads: 3 This Week

Last Update: 2026-04-14
See Project
6

dxy-covid-19-crawler

Realtime crawler for COVID-19 outbreak statistics from DXY data

...DXY-COVID-19-Crawler automatically crawls data at regular intervals, typically every minute, ensuring that newly published statistics are captured as quickly as possible. Retrieved data is stored in MongoDB and archived so that the entire progression of the outbreak can be traced over time. It also provided an API that allowed developers to easily access the collected data for building dashboards, visualizations, and other analytical tools.

Downloads: 1 This Week

Last Update: 5 days ago
See Project
7

Weibo Crawler

Python crawler for collecting and downloading Sina Weibo user data

...It also captures detailed data about each post, including the content, publishing time, topics, mentions, likes, reposts, and comments. In addition to textual data, the project can download original media from posts, such as images, videos, and Live Photo content. Collected data can be exported to structured formats such as CSV or JSON or stored in databases for further analysis and research.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
8

spider_collection

Collection of Python web scraping scripts for data extraction tasks

...In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.

Downloads: 3 This Week

Last Update: 6 days ago
See Project
9

watercrawl

AI-ready web crawler that extracts and structures website content

...WaterCrawl supports customizable extraction rules so users can focus only on relevant elements while ignoring unnecessary page components. WaterCrawl also offers real-time monitoring capabilities, allowing users to track crawling progress, performance metrics, and errors during large data collection jobs. Developers can integrate the tool into applications through a REST API and multiple client SDKs, enabling automated data pipelines and AI data preparation workflows.

Downloads: 2 This Week

Last Update: 2026-03-11
See Project
World class QA, 100% done-for-you
For engineering teams in search of a solution to design, manage and maintain E2E tests for their apps

MuukTest is a test automation service that combines our own proprietary, AI-powered software with expert QA services to help you achieve world class test automation at a fraction of the in-house costs.

Learn More
10

rvest

Simple web scraping for R

rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.

Downloads: 2 This Week

Last Update: 2025-08-29
See Project
11

Geziyor

Blazing fast Go framework for web crawling and data scraping tasks

...It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows. It provides a flexible architecture where developers define parsing functions that process responses and extract the desired data. Geziyor includes features for managing requests, handling cookies, respecting robots rules, and exporting collected data in multiple formats. ...

Downloads: 2 This Week

Last Update: 6 days ago
See Project
12

Ferret

Declarative web scraping

A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...

Downloads: 1 This Week

Last Update: 2025-05-07
See Project
13

miniblink49

Lighter, faster browser kernel of blink to integrate HTML UI in apps

miniblink is an open source, one file, small browser widget based on chromium. By using C interface, you can create a browser with just some line code. miniblink is an open source, single-file, and currently the smallest known chromium-based browser control. Through its exported pure C interface, a browser control can be created in a few lines of code. C++, C#, Delphi and other language calls (support C++, C#, Delphi language to call). Embedded Nodejs, support electron (with Nodejs, can run...

Downloads: 8 This Week

Last Update: 2025-12-13
See Project
14

DotnetSpider

Lightweight .NET framework for fast web crawling and data scraping

DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
15

CyberScraper 2077

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.

Downloads: 2 This Week

Last Update: 2026-01-20
See Project
16

Scweet

Scrape tweets, profiles, followers and following from Twitter/X

Scweet is a Python-based Twitter/X scraping library and CLI designed to collect tweets, profile timelines, followers, following lists, and user profile data without requiring the official Twitter/X API or a developer account. Instead of depending on deprecated unauthenticated scraping methods, it works by using X’s web GraphQL API together with authenticated browser cookies, which gives it a more current and practical approach for data extraction. The project supports a broad set of collection patterns, including searches by keyword, hashtag, user, date range, engagement thresholds, language, and location, making it useful for research, monitoring, and data gathering workflows. ...

Downloads: 3 This Week

Last Update: 6 days ago
See Project
17

wombat

Lightweight Ruby DSL for scraping structured data from web pages

Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured results. ...

Downloads: 0 This Week

Last Update: 2026-04-07
See Project
18

fess

Open source enterprise search server for websites, files, and data

Fess is an open source enterprise search server designed to provide powerful full-text search capabilities across multiple data sources. It enables organizations to quickly deploy a scalable search environment without requiring deep knowledge of underlying search technologies. Fess is built on top of OpenSearch and offers an integrated solution for crawling, indexing, and searching documents from websites, file systems, and various data stores. Fess includes a built-in crawler that can collect content from sources such as databases, CSV files, and shared storage, making it suitable for centralized knowledge discovery. ...

Downloads: 1 This Week

Last Update: 4 days ago
See Project
19

Spider

High-performance Rust web crawler and scraper for large-scale data

...Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. It supports advanced capabilities such as headless browser rendering, background crawling tasks, and configurable rules that control crawl depth or ignored paths. These capabilities make the project suitable for building search indexers, data extraction pipelines, & SEO analysis tools.

Downloads: 1 This Week

Last Update: 2026-03-31
See Project
20

Snoop Project

This is the most powerful software taking into account CIS location

...Snoop is a research work (own database / closed bugbounty) in the field of searching and processing public data on the Internet. In terms of specialized search, Snoop is able to compete with traditional search engines.

Downloads: 4 This Week

Last Update: 2026-01-01
See Project
21

diskover-community

Open source file indexing & storage analytics powered by Elasticsearch

...Diskover also helps identify outdated or unused files, duplicate data, and inefficient storage usage that can waste resources or increase operational costs. A Python-based indexing engine performs the scanning and indexing tasks.

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
22

FinalRecon

All-in-one Python web reconnaissance tool for fast target analysis

FinalRecon is an all-in-one web reconnaissance tool written in Python that helps security professionals gather information about a target website quickly and efficiently. It combines multiple reconnaissance techniques into a single command-line utility so users do not need to run several separate tools to collect similar data. FinalRecon focuses on providing a fast overview of a web target while maintaining accuracy in the collected results. It includes modules for gathering server information, analyzing SSL certificates, performing WHOIS lookups, and crawling website resources. FinalRecon can also enumerate DNS records, discover subdomains, search for directories and files, and scan common network ports. ...

Downloads: 2 This Week

Last Update: 3 days ago
See Project
23

finvizfinance

Finviz analysis python library

...Stock charts, fundamental & technical information, insider information and stock news. Forex charts and performance. Crypto charts and performance. Screener and Group provide data frames for comparing stocks according to different filters and trading signals. Getting information (fundament, description, outer rating, stock news, inside trader) of an individual stock.

Downloads: 10 This Week

Last Update: 2026-01-03
See Project
24

jsoup

Java library for working with real-world HTML

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. ...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
25

changedetection.io

The best free open source website change detection and restock service

Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. ...

Downloads: 6 This Week

Last Update: 2 days ago
See Project

Previous
You're on page 1
2
3
4
5
Next

Related Searches

osint

scrapy

scrape

web scraping

delphi

web scraper

osint tools

stock trading

jsoup

website change

Related Categories

Internet

Formats and Protocols

Software Development

Business

System

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise