Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Internet
Web Scrapers
Search Results

Search Results for "sandbox:/mnt/data/project_plan.pod" - Page 2

x

Sort By:

Relevance

Clear All Filters

OS

Windows 48
Linux 47
Mac 43
More...
BSD 33
ChromeOS 32

Category

Internet 50
Artificial Intelligence 2
Software Development 2
Business 1
Formats and Protocols 1
Security 1
System 1

License

OSI-Approved Open Source 45
Other License 1

Programming Language

Python 50
JavaScript 5
TypeScript 4
Java 3
PHP 2
More...
PowerShell 2
Unix Shell 2
C# 1
Go 1

Status

Beta 3

Showing 50 open source projects for "sandbox:/mnt/data/project_plan.pod"

View related business solutions

Web Scrapers Python Clear Filters & Widen Search

Securing the Cloud Made Easy
Multi-cloud security delivered — now and in the future.

Designed for organizations operating in the cloud who need complete, centralized visibility of their entire cloud estate and want more time and resources dedicated to remediating the actual risks that matter, Orca Security is an agentless cloud Security Platform that provides security teams with 100% coverage their entire cloud environment.

Learn More
Fully managed relational database service for MySQL, PostgreSQL, and SQL Server
Focus on your application, and leave the database to us

Cloud SQL manages your databases so you don't have to, so your business can run without disruption. It automates all your backups, replication, patches, encryption, and storage capacity increases to give your applications the reliability, scalability, and security they need.

Try for free
1

bilili

Command-line Bilibili video and danmaku downloader with batch support

...It focuses on enabling users to retrieve user-uploaded videos as well as serialized content such as bangumi episodes directly from the terminal environment. It provides automated downloading capabilities that handle video streams and associated data efficiently while minimizing manual interaction. bilili supports retrieving both the video files and danmaku comments, which are the scrolling overlay comments commonly associated with the platform’s videos. These danmaku comments can be automatically converted into ASS subtitle format for playback compatibility with media players. bilili also implements multi-threaded and segmented downloading techniques to improve download performance and reliability. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
2

dirhunt

Web crawler that finds hidden web directories without brute force

Dirhunt is an open source security tool designed to discover web directories and analyze website structures without relying on brute-force techniques. Instead of sending large numbers of guess-based requests, it operates as a specialized crawler that intelligently explores websites to identify accessible or hidden directories. Dirhunt can detect directories that expose “Index Of” listings, which may reveal files and other resources that were not intended to be publicly visible. It can also...

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
3

Crawlab

Distributed web crawler admin platform for spiders management

...Tasks are scheduled by the task scheduler module in the master node, and received by the task handler module in worker nodes, which executes these tasks in task runners. Task runners are actually processes running spider or crawler programs, and can also send data through gRPC (integrated in SDK) to other data sources, e.g. MongoDB.

Downloads: 3 This Week

Last Update: 2023-07-26
See Project
4

scraper-with-chatgpt

It is a powerful data scraping tool that helps you extract information from various online sources. Easily collect data from Google SERP, Maps, Shopify, Zillow, and more. With a user-friendly interface, you can scrape and save data in JSON or Excel formats. Unlock insights from the web effortlessly with scrape-it.cloud API.

Downloads: 1 This Week

Last Update: 2023-08-28
See Project
Workload Automation for Global Enterprises
Orchestrate Your Entire Tech Stack with Redwood RunMyJobs

Redwood lets you orchestrate securely and reliably across any application, service or server, in the cloud or on-premise, all inside a single platform.

Learn More
5

DecryptLogin

Python library providing APIs for automated website login workflows

...DecryptLogin supports a wide variety of online services and platforms, including social media sites, developer platforms, cloud services, and other web portals. Developers can integrate these login routines into automation scripts, crawlers, or data collection tools that require authenticated sessions. It also provides example utilities and automation scripts demonstrating how the login APIs can be used in practical scenarios.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
6

AutoScraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.

Downloads: 0 This Week

Last Update: 2023-04-12
See Project
7

mlscraper

ML-based HTML scraper that learns extraction rules from examples

mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages.

Downloads: 2 This Week

Last Update: 4 days ago
See Project
8

pspider

Simple Python framework for building multithreaded web crawlers

...It focuses on providing an easy-to-understand architecture while still supporting concurrent crawling for improved performance. It uses a multithreaded model that separates the crawling workflow into several components responsible for fetching, parsing, and saving data. Tasks are managed through queues, allowing different parts of the crawler to process work asynchronously and efficiently. PSpider defines a set of modules and utility classes that help developers manage crawling tasks, filter URLs, and process scraped content. By organizing crawling tasks into structured stages, PSpider allows developers to build scalable spiders while keeping the codebase relatively compact and readable. ...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
9

Scylla

Intelligent proxy pool for collecting and managing public proxies

...In addition to the API, the system provides a web-based interface where users can view available proxies and monitor their global distribution through a visual dashboard. It is commonly used by developers who need scalable proxy management when gathering data from the internet or building datasets for machine learning.

Downloads: 10 This Week

Last Update: 2026-03-10
See Project
A warehouse and inventory management software that scales with your business.
For leading 3PLs and high-volume brands searching for an advanced WMS

Logiwa is a leader in cloud-native fulfillment technology, revolutionizing high-volume fulfillment for third-party logistics (3PLs), B2B and B2C fulfillment networks, and direct-to-consumer brands. Our flagship product, Logiwa IO, is an advanced Fulfillment Management System (FMS) designed to scale operations in the digital era. Logiwa elevates digital warehousing to new heights, ensuring dynamic and efficient fulfillment processes. Our commitment to AI-driven technology, combined with a focus on customer-centricity, equips businesses to adeptly navigate and excel in rapidly changing market landscapes. Discover the future of smart fulfillment and how you can fulfill brilliantly with Logiwa IO.

Learn More
10

instagram-profilecrawl

Instagram profile crawler that extracts posts, tags, and stats

...Authentication is optional, meaning the crawler can access public profile data without logging in.

Downloads: 4 This Week

Last Update: 6 days ago
See Project
11

lxspider

Educational Python web scraping case collection for many sites

lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms, social media services, content sites, research databases, and information portals. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
12

ruia

Async Python framework for fast and flexible web scraping spiders

Ruia is an asynchronous web scraping micro-framework built for Python that focuses on simplicity, speed, and flexibility when creating web crawlers. Ruia is powered by Python’s asyncio library along with aiohttp, enabling developers to perform concurrent network requests efficiently and scrape data from websites with minimal overhead. Ruia follows a “write less, run faster” philosophy, emphasizing concise code and streamlined spider development. It provides a structured approach to building scraping projects through components such as data items, spiders, middleware, and plugins. Developers can define structured fields to extract information from HTML content and process responses asynchronously to improve crawling performance. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
13

GoogleScraper

Python tool for scraping search engine results from many providers

...By running automated queries and collecting results in bulk, the project can assist with tasks such as SEO research, trend discovery, or building datasets of websites related to specific keywords. GoogleScraper also includes capabilities for running multiple scraping tasks concurrently to improve performance and increase the amount of collected data.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
14

ECommerceCrawlers

Collection of Python ecommerce and website crawler examples projects

ECommerceCrawlers is a collection of practical Python web crawler projects designed to gather data from a variety of ecommerce platforms, websites, and online services. It aggregates many independent crawler examples created by contributors and organized into separate subprojects that target specific sites or data sources. These examples demonstrate how to build and operate web scrapers capable of collecting structured information such as product listings, news content, job postings, social media data, and other publicly available web data. ...

Downloads: 8 This Week

Last Update: 16 hours ago
See Project
15

Requests-HTML

Pythonic HTML Parsing for Humans

...The rest of the code operates the same way as the synchronous version except that results is a list containing multiple response objects however the same basic processes can be applied as above to extract the data you want.

Downloads: 0 This Week

Last Update: 2023-04-10
See Project
16

django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

...Since it simplifies things DDS is not usable for all kinds of scrapers, but it is well suited for the relatively common case of regularly scraping a website with a list of updated items (e.g. news, events, etc.) and then dig into the detail page to scrape some more infos for each item. Django Dynamic Scraper tries to keep its data structure in the database as separated as possible from the models in your app, so it comes with its own Django model classes for defining scrapers, runtime information related to your scraper runs and classes.

Downloads: 0 This Week

Last Update: 2022-09-05
See Project
17

mzitu

Python crawler that downloads image galleries and analyzes titles

...Using text segmentation and frequency analysis, the project can create a word cloud representing common keywords found in the dataset. This makes the repository both a scraping example and a small data analysis experiment built around the collected content. Overall, mzitu serves as a learning-oriented implementation of Python web scraping, data processing, and visualization techniques.

Downloads: 4 This Week

Last Update: 6 days ago
See Project
18

Twitter Intelligence

Twitter Intelligence OSINT project performs tracking and analysis

...This project is a Python 3.x application. The package dependencies are in the file requirements.txt. Run that command to install the dependencies. SQLite is used as the database. Tweet data is stored on the Tweet, User, Location, Hashtag, HashtagTweet tables. The database is created automatically. analysis.py performs analysis processing. User, hashtag, and location analyzes are performed. You must write Google Map API Key in setting.py to display Google Maps.

Downloads: 0 This Week

Last Update: 2023-04-12
See Project
19

WeChatSogou

Python library to crawl and retrieve data from WeChat accounts

WechatSogou is an open source Python library designed to retrieve data from WeChat official accounts by using the Sogou WeChat search service as its data source. It provides developers with a programmatic way to search for public accounts and collect article information without manually browsing the search interface. It functions as a crawler interface that sends requests to the search engine, retrieves results, and converts the returned pages into structured data that can be used in applications or analysis pipelines. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
20

pyspider

A powerful Spider(Web Crawler) system in Python

...Or using MySQL or MongoDB and RabbitMQ to deploy a distributed crawl cluster. To deploy pyspider in product environment, running component in each process and store data in database service is more reliable and flexible. To deploy pyspider components in each single processes, you need at least one database service. pyspider now supports MySQL, MongoDB and PostgreSQL. You can choose one of them.

Downloads: 0 This Week

Last Update: 2021-03-31
See Project
21

gain

Asyncio-based Python framework for building fast web crawling spiders

...It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and items, allowing them to organize crawling logic and data extraction rules clearly. Gain supports CSS selectors and XPath expressions for parsing page content and extracting specific elements. Gain also allows developers to configure headers, concurrency levels, and proxy settings to control how crawlers interact with target websites. ...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
22

Toapi

Convert websites into structured APIs automatically with Python tool

Toapi is a Python library designed to transform ordinary websites into usable API services. Instead of building a traditional web crawler that collects and stores data before exposing it through an API, Toapi simplifies the process by allowing developers to define data structures that automatically generate an API layer from existing web pages. It works by parsing HTML content from a source site and mapping selected elements into structured data that can be returned as JSON through API endpoints. ...

Downloads: 1 This Week

Last Update: 16 hours ago
See Project
23

haipproxy

Distributed proxy IP pool for web crawlers using Scrapy and Redis

...It automatically crawls proxy resources from the internet and aggregates them into a centralized pool that can be accessed by distributed spiders and scraping systems. It is built using Python and relies on Scrapy for high-performance crawling while Redis is used for data storage, communication, and task coordination between components. It includes crawlers that discover proxy servers, validators that test proxy availability and performance, and schedulers that manage crawling and validation tasks. HAipproxy aims to maintain a high availability proxy pool with low latency so that scraping frameworks can rotate proxies efficiently and avoid blocking during large-scale data collection. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
24

DSTK - DataScience ToolKit

DSTK - DataScience ToolKit for All of Us

...Of course you may specify JASP for advanced data editing and RapidMiner for advanced prediction modeling. DSTK is written in C#, Java and Python to interface with R, NLTK, and Weka. It can be expanded with plugins using R Scripts. We have also created plugins for more statistical functions, and Big Data Analytics with Microsoft Azure HDInsights (Spark Server) with Livy.

Downloads: 2 This Week

Last Update: 2018-05-08
See Project
25

Web Crawler Security Tool

A web crawler oriented to information security.

...The main task of this tool is to search and list all the links (pages and files) in a web site. The crawler has been completely rewritten in v1.0 bringing a lot of improvements: improved the data visualization, interactive option to download files, increased speed in crawling, exports list of found files into a separated file (useful to crawl a site once, then download files and analyse them with FOCA), generate an output log in Common Log Format (CLF), manage basic authentication and more! Many of the old features has been reimplemented and the most interesting one is the capability of the crawler to search for directory indexing.

3 Reviews

Downloads: 0 This Week

Last Update: 2015-10-10
See Project

Previous
1
You're on page 2
Next

Related Searches

scylla

web spider

web scraper

scrape

osint

lg bypass tool

jasp

foca

php spider

social

Related Categories

Internet

Artificial Intelligence

Software Development

Business

Formats and Protocols

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise