Search Results for "sandbox:/mnt/data/project_plan.pod" - Page 2

Showing 50 open source projects for "sandbox:/mnt/data/project_plan.pod"

View related business solutions
  • Securing the Cloud Made Easy Icon
    Securing the Cloud Made Easy

    Multi-cloud security delivered — now and in the future.

    Designed for organizations operating in the cloud who need complete, centralized visibility of their entire cloud estate and want more time and resources dedicated to remediating the actual risks that matter, Orca Security is an agentless cloud Security Platform that provides security teams with 100% coverage their entire cloud environment.
    Learn More
  • Fully managed relational database service for MySQL, PostgreSQL, and SQL Server Icon
    Fully managed relational database service for MySQL, PostgreSQL, and SQL Server

    Focus on your application, and leave the database to us

    Cloud SQL manages your databases so you don't have to, so your business can run without disruption. It automates all your backups, replication, patches, encryption, and storage capacity increases to give your applications the reliability, scalability, and security they need.
    Try for free
  • 1
    bilili

    bilili

    Command-line Bilibili video and danmaku downloader with batch support

    ...It focuses on enabling users to retrieve user-uploaded videos as well as serialized content such as bangumi episodes directly from the terminal environment. It provides automated downloading capabilities that handle video streams and associated data efficiently while minimizing manual interaction. bilili supports retrieving both the video files and danmaku comments, which are the scrolling overlay comments commonly associated with the platform’s videos. These danmaku comments can be automatically converted into ASS subtitle format for playback compatibility with media players. bilili also implements multi-threaded and segmented downloading techniques to improve download performance and reliability. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    dirhunt

    dirhunt

    Web crawler that finds hidden web directories without brute force

    Dirhunt is an open source security tool designed to discover web directories and analyze website structures without relying on brute-force techniques. Instead of sending large numbers of guess-based requests, it operates as a specialized crawler that intelligently explores websites to identify accessible or hidden directories. Dirhunt can detect directories that expose “Index Of” listings, which may reveal files and other resources that were not intended to be publicly visible. It can also...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    ...Tasks are scheduled by the task scheduler module in the master node, and received by the task handler module in worker nodes, which executes these tasks in task runners. Task runners are actually processes running spider or crawler programs, and can also send data through gRPC (integrated in SDK) to other data sources, e.g. MongoDB.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    scraper-with-chatgpt
    It is a powerful data scraping tool that helps you extract information from various online sources. Easily collect data from Google SERP, Maps, Shopify, Zillow, and more. With a user-friendly interface, you can scrape and save data in JSON or Excel formats. Unlock insights from the web effortlessly with scrape-it.cloud API.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Workload Automation for Global Enterprises Icon
    Workload Automation for Global Enterprises

    Orchestrate Your Entire Tech Stack with Redwood RunMyJobs

    Redwood lets you orchestrate securely and reliably across any application, service or server, in the cloud or on-premise, all inside a single platform.
    Learn More
  • 5
    DecryptLogin

    DecryptLogin

    Python library providing APIs for automated website login workflows

    ...DecryptLogin supports a wide variety of online services and platforms, including social media sites, developer platforms, cloud services, and other web portals. Developers can integrate these login routines into automation scripts, crawlers, or data collection tools that require authenticated sessions. It also provides example utilities and automation scripts demonstrating how the login APIs can be used in practical scenarios.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    pspider

    pspider

    Simple Python framework for building multithreaded web crawlers

    ...It focuses on providing an easy-to-understand architecture while still supporting concurrent crawling for improved performance. It uses a multithreaded model that separates the crawling workflow into several components responsible for fetching, parsing, and saving data. Tasks are managed through queues, allowing different parts of the crawler to process work asynchronously and efficiently. PSpider defines a set of modules and utility classes that help developers manage crawling tasks, filter URLs, and process scraped content. By organizing crawling tasks into structured stages, PSpider allows developers to build scalable spiders while keeping the codebase relatively compact and readable. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Scylla

    Scylla

    Intelligent proxy pool for collecting and managing public proxies

    ...In addition to the API, the system provides a web-based interface where users can view available proxies and monitor their global distribution through a visual dashboard. It is commonly used by developers who need scalable proxy management when gathering data from the internet or building datasets for machine learning.
    Downloads: 10 This Week
    Last Update:
    See Project
  • A warehouse and inventory management software that scales with your business. Icon
    A warehouse and inventory management software that scales with your business.

    For leading 3PLs and high-volume brands searching for an advanced WMS

    Logiwa is a leader in cloud-native fulfillment technology, revolutionizing high-volume fulfillment for third-party logistics (3PLs), B2B and B2C fulfillment networks, and direct-to-consumer brands. Our flagship product, Logiwa IO, is an advanced Fulfillment Management System (FMS) designed to scale operations in the digital era. Logiwa elevates digital warehousing to new heights, ensuring dynamic and efficient fulfillment processes. Our commitment to AI-driven technology, combined with a focus on customer-centricity, equips businesses to adeptly navigate and excel in rapidly changing market landscapes. Discover the future of smart fulfillment and how you can fulfill brilliantly with Logiwa IO.
    Learn More
  • 10
    instagram-profilecrawl

    instagram-profilecrawl

    Instagram profile crawler that extracts posts, tags, and stats

    ...Authentication is optional, meaning the crawler can access public profile data without logging in.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    lxspider

    lxspider

    Educational Python web scraping case collection for many sites

    lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms, social media services, content sites, research databases, and information portals. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    ruia

    ruia

    Async Python framework for fast and flexible web scraping spiders

    Ruia is an asynchronous web scraping micro-framework built for Python that focuses on simplicity, speed, and flexibility when creating web crawlers. Ruia is powered by Python’s asyncio library along with aiohttp, enabling developers to perform concurrent network requests efficiently and scrape data from websites with minimal overhead. Ruia follows a “write less, run faster” philosophy, emphasizing concise code and streamlined spider development. It provides a structured approach to building scraping projects through components such as data items, spiders, middleware, and plugins. Developers can define structured fields to extract information from HTML content and process responses asynchronously to improve crawling performance. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    GoogleScraper

    GoogleScraper

    Python tool for scraping search engine results from many providers

    ...By running automated queries and collecting results in bulk, the project can assist with tasks such as SEO research, trend discovery, or building datasets of websites related to specific keywords. GoogleScraper also includes capabilities for running multiple scraping tasks concurrently to improve performance and increase the amount of collected data.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    ECommerceCrawlers

    ECommerceCrawlers

    Collection of Python ecommerce and website crawler examples projects

    ECommerceCrawlers is a collection of practical Python web crawler projects designed to gather data from a variety of ecommerce platforms, websites, and online services. It aggregates many independent crawler examples created by contributors and organized into separate subprojects that target specific sites or data sources. These examples demonstrate how to build and operate web scrapers capable of collecting structured information such as product listings, news content, job postings, social media data, and other publicly available web data. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    Requests-HTML

    Requests-HTML

    Pythonic HTML Parsing for Humans

    ...The rest of the code operates the same way as the synchronous version except that results is a list containing multiple response objects however the same basic processes can be applied as above to extract the data you want.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    django-dynamic-scraper

    django-dynamic-scraper

    Creating Scrapy scrapers via the Django admin interface

    ...Since it simplifies things DDS is not usable for all kinds of scrapers, but it is well suited for the relatively common case of regularly scraping a website with a list of updated items (e.g. news, events, etc.) and then dig into the detail page to scrape some more infos for each item. Django Dynamic Scraper tries to keep its data structure in the database as separated as possible from the models in your app, so it comes with its own Django model classes for defining scrapers, runtime information related to your scraper runs and classes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    ...Using text segmentation and frequency analysis, the project can create a word cloud representing common keywords found in the dataset. This makes the repository both a scraping example and a small data analysis experiment built around the collected content. Overall, mzitu serves as a learning-oriented implementation of Python web scraping, data processing, and visualization techniques.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Twitter Intelligence

    Twitter Intelligence

    Twitter Intelligence OSINT project performs tracking and analysis

    ...This project is a Python 3.x application. The package dependencies are in the file requirements.txt. Run that command to install the dependencies. SQLite is used as the database. Tweet data is stored on the Tweet, User, Location, Hashtag, HashtagTweet tables. The database is created automatically. analysis.py performs analysis processing. User, hashtag, and location analyzes are performed. You must write Google Map API Key in setting.py to display Google Maps.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    WeChatSogou

    WeChatSogou

    Python library to crawl and retrieve data from WeChat accounts

    WechatSogou is an open source Python library designed to retrieve data from WeChat official accounts by using the Sogou WeChat search service as its data source. It provides developers with a programmatic way to search for public accounts and collect article information without manually browsing the search interface. It functions as a crawler interface that sends requests to the search engine, retrieves results, and converts the returned pages into structured data that can be used in applications or analysis pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    pyspider

    pyspider

    A powerful Spider(Web Crawler) system in Python

    ...Or using MySQL or MongoDB and RabbitMQ to deploy a distributed crawl cluster. To deploy pyspider in product environment, running component in each process and store data in database service is more reliable and flexible. To deploy pyspider components in each single processes, you need at least one database service. pyspider now supports MySQL, MongoDB and PostgreSQL. You can choose one of them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    ...It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and items, allowing them to organize crawling logic and data extraction rules clearly. Gain supports CSS selectors and XPath expressions for parsing page content and extracting specific elements. Gain also allows developers to configure headers, concurrency levels, and proxy settings to control how crawlers interact with target websites. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Toapi

    Toapi

    Convert websites into structured APIs automatically with Python tool

    Toapi is a Python library designed to transform ordinary websites into usable API services. Instead of building a traditional web crawler that collects and stores data before exposing it through an API, Toapi simplifies the process by allowing developers to define data structures that automatically generate an API layer from existing web pages. It works by parsing HTML content from a source site and mapping selected elements into structured data that can be returned as JSON through API endpoints. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    haipproxy

    haipproxy

    Distributed proxy IP pool for web crawlers using Scrapy and Redis

    ...It automatically crawls proxy resources from the internet and aggregates them into a centralized pool that can be accessed by distributed spiders and scraping systems. It is built using Python and relies on Scrapy for high-performance crawling while Redis is used for data storage, communication, and task coordination between components. It includes crawlers that discover proxy servers, validators that test proxy availability and performance, and schedulers that manage crawling and validation tasks. HAipproxy aims to maintain a high availability proxy pool with low latency so that scraping frameworks can rotate proxies efficiently and avoid blocking during large-scale data collection. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    ...Of course you may specify JASP for advanced data editing and RapidMiner for advanced prediction modeling. DSTK is written in C#, Java and Python to interface with R, NLTK, and Weka. It can be expanded with plugins using R Scripts. We have also created plugins for more statistical functions, and Big Data Analytics with Microsoft Azure HDInsights (Spark Server) with Livy.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25

    Web Crawler Security Tool

    A web crawler oriented to information security.

    ...The main task of this tool is to search and list all the links (pages and files) in a web site. The crawler has been completely rewritten in v1.0 bringing a lot of improvements: improved the data visualization, interactive option to download files, increased speed in crawling, exports list of found files into a separated file (useful to crawl a site once, then download files and analyse them with FOCA), generate an output log in Common Log Format (CLF), manage basic authentication and more! Many of the old features has been reimplemented and the most interesting one is the capability of the crawler to search for directory indexing.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB