pdf extract free download

Showing 118 open source projects for "pdf extract"

View related business solutions

Comet Backup - Fast, Secure Backup Software for MSPs
Fast, Secure Backup Software for Businesses and IT Providers

Comet is a flexible backup platform, giving you total control over your backup environment and storage destinations.

Learn More
The full-stack observability platform that protects your dataLayer, tags and conversion data
Stop losing revenue to bad data today. and protect your marketing data with Code-Cube.io.

Code-Cube.io detects issues instantly, alerts you in real time and helps you resolve them fast. No manual QA. No unreliable data. Just data you can trust and act on.

Learn More
1

LangChain Extract

Did you say you like data?

LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents.

Downloads: 1 This Week

Last Update: 2026-03-09
See Project
2

py-pdf-parser

A Python tool to help extracting information from structured PDFs

py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents.

Downloads: 8 This Week

Last Update: 2025-04-28
See Project
3

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models. Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction...

Downloads: 4 This Week

Last Update: 2026-03-05
See Project
4

PDFsam

PDFsam, a desktop application to split, merge, mix, rotate PDF files

PDFsam Basic is our free and open-source desktop application to split, merge, extract pages, rotate and mix PDF files. PDFsam Visual is a powerful tool to visually compose PDF files, reorder pages, delete pages, split, merge, rotate, encrypt, decrypt, extract text, convert to grayscale, crop PDF files. PDFsam Basic is written using JavaFX. Since version 4 it is released as a self-contained application and bundles a jlinked JDK while version 3 requires a Java Runtime Environment 8 with JavaFx installed in order to run.

Downloads: 191 This Week

Last Update: 2026-03-30
See Project
Inventory and Order Management Software for Multichannel Sellers
Avoid stockouts, overselling, and losing control as your business grows.

We are the most powerful inventory and order management platform for Amazon, Walmart, and multichannel product sellers. Centralize orders, product information, and fulfillment operations to run more efficiently, sell more products, and stay compliant with marketplace requirements so you can grow profitably.

Learn More
5

PdfPig

Read and extract text and other content from PDFs in C#

This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.

Downloads: 9 This Week

Last Update: 2026-03-22
See Project
6

PyPDF

A pure-python PDF library capable of splitting, merging, cropping

pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.

Downloads: 6 This Week

Last Update: 2 days ago
See Project
7

pdfly

CLI tool to extract (meta)data from PDF and manipulate PDF files

A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.

Downloads: 5 This Week

Last Update: 2025-10-13
See Project
8

PDFIO.jl

PDF Reader Library for Native Julia.

PDFIO is a native Julia implementation for reading PDF files. It's a 100% Julia implementation of the PDF specification. Other than a few well-established algorithms like flate decode (zlib library) or cryptographic operations (OpenSSL library) almost all of the APIs are written in native Julia. PDF files are in existence for over three decades. Implementations of the PDF writers are not always to the specification or they may even vary significantly from vendor to vendor. Every time, you...

Downloads: 4 This Week

Last Update: 2025-02-05
See Project
9

unipdf

Golang PDF library for creating and processing PDF files (pure go)

UniDoc UniPDF is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is written and supported by FoxyUtils.com, where the library is used to power many of its services. Every release of our libraries is automatically tested against known vulnerabilities and do not pass unless everything is remediated. All changes are carefully reviewed by our team.

Downloads: 6 This Week

Last Update: 2026-04-09
See Project
SoftCo: Enterprise Invoice and P2P Automation Software
For companies that process over 20,000 invoices per year

SoftCo Accounts Payable Automation processes all PO and non-PO supplier invoices electronically from capture and matching through to invoice approval and query management. SoftCoAP delivers unparalleled touchless automation by embedding AI across matching, coding, routing, and exception handling to minimize the number of supplier invoices requiring manual intervention. The result is 89% processing savings, supported by a context-aware AI Assistant that helps users understand exceptions, answer questions, and take the right action faster.

Learn More
10

pikepdf

A Python library for reading and writing PDF, powered by QPDF

pikepdf is a Python library allowing the creation, manipulation, and repair of PDFs. It provides a Pythonic wrapper around the C++ PDF content transformation library, QPDF. Python + QPDF = “py” + “qpdf” = “pyqpdf”, which looks like a dyslexia test and is no fun to type. But say “pyqpdf” out loud, and it sounds like “pikepdf”. pikepdf is a library intended for developers who want to create, manipulate, parse, repair, and abuse the PDF format. It supports reading and write PDFs, including...

Downloads: 81 This Week

Last Update: 2026-03-18
See Project
11

PDFPatcher

A versatile toolkit for PDF manipulation

PDFPatcher (aka “PDF补丁丁”) is a versatile toolkit for PDF manipulation—editing document metadata, bookmarks, page layout, content restrictions, rotation, compression, merging/splitting, image extraction, and more, all within an intuitive interface. Merge/split PDFs or images, preserve or add bookmarks, and set page dimensions. Batch style/color/target changes, regex/XPath search/replace, mid‑page positioning. Modify PDF metadata, page numbers, links, initial view mode, and remove open actions.

Downloads: 37 This Week

Last Update: 2025-08-14
See Project
12

iLovePDF Api

iLovePDF Rest Api - PHP Library

...We offer a simple and concise API Reference and Guide as well as API Libraries with their own docs too. Our infrastructure uses the best PDF technology for processing PDF files. Merge and split documents with a variety of custom options. Remove, extract or organize PDF pages as you need. Reduce the size of your PDF while maintaining its original quality and formatting. Easily convert Images, MS Word, PowerPoint and Excel files into non-editable PDF documents. Convert PDF documents to JPG images or to PDF/A format.

Downloads: 6 This Week

Last Update: 2024-06-20
See Project
13

Documind

Open-source platform for extracting structured data from documents

Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.

Downloads: 7 This Week

Last Update: 2025-02-21
See Project
14

PDFCraft

PDFCraft is a free, privacy-focused PDF toolkit

PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite.

Downloads: 14 This Week

Last Update: 2026-04-07
See Project
15

Scribe.js

JavaScript OCR and text extraction for images and PDFs

Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. ...

Downloads: 9 This Week

Last Update: 2026-03-14
See Project
16

PyMuPDF

Python bindings for MuPDF's rendering library.

MuPDF is a lightweight PDF, XPS, and E-book viewer. MuPDF consists of a software library, command line tools, and viewers for various platforms. The renderer in MuPDF is tailored for high-quality anti-aliased graphics. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on the screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB,...

Downloads: 21 This Week

Last Update: 2026-03-17
See Project
17

Stirling-PDF

#1 Locally hosted web application that allows you to work on PDFs

This is a robust, locally hosted web-based PDF manipulation tool using Docker. It enables you to carry out various operations on PDF files, including splitting, merging, converting, reorganizing, adding images, rotating, compressing, and more. This locally hosted web application has evolved to encompass a comprehensive set of features, addressing all your PDF requirements. Stirling PDF does not initiate any outbound calls for record-keeping or tracking purposes. All files and PDFs...

Downloads: 117 This Week

Last Update: 2 days ago
See Project
18

Image Toolbox

Image Toolbox is an powerful picture editor, which can crop

Image Toolbox is a powerful picture editor, which can crop, apply filters, add some drawings, erase background, edit EXIF, or even create a PDF file.

Downloads: 25 This Week

Last Update: 2026-04-09
See Project
19

Umi-OCR

OCR software, free and offline

Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. ...

Downloads: 35 This Week

Last Update: 2026-01-15
See Project
20

Free PDF Editor

"A free, open-source PDF editor for basic editing tasks"

Downloads: 0 This Week

Last Update: 2025-10-16
See Project
21

Sprint PDF Editor (Smarter PDF Solution)

Edit, Convert, Extract , Export, Secure and PDF Imposition.

Sprint PDF Editor® The Productive, Modern, Innovative, Clean & Colourful GUI. Faster, Smarter & Seamless workflows, with 50+ functions. Sprint PDF Editor & Reader, Complete PDF Solution, Supercharge Your Workflows With Imposition, Extract, Compress, Watermark, Protect & Secure, Split & Merge, Crop Pages, Printing, Stamp & more. Your Privacy, Our Priority Protect Your Data with Complete Confidence.

2 Reviews

Downloads: 14 This Week

Last Update: 2026-03-08
See Project
22

PDF Tinkerer

Tinker with PDF files

Tinker with PDF files. Download the JAR file for your OS (e.g. Windows) and double click on it. You will need at least Java 21 (e.g. https://adoptium.net/temurin/releases/?os=any&arch=any&version=21) to run this Desktop-App. The latest releases of PDF Tinkerer can now be found on: https://gitlab.com/gjwu/pdf-tinkerer/-/releases

Downloads: 2 This Week

Last Update: 2025-05-21
See Project
23

PDF Split and Merge

Split and merge PDF files on any platform

Split and merge PDF files with PDFsam, an easy-to-use desktop tool with graphical, command line and web interface.

Downloads: 348 This Week

Last Update: 2026-03-30
See Project
24

PDFTK Builder Enhanced

Enhanced version of the PDFTK Builder GUI for PDF Toolkit on Windows

Free and open source GUI application for manipulating PDF files using the Windows version of PDF Toolkit (PDFtk) - split, merge, stamp, number pages, rotate, metadata, bookmarks, attachments, etc. This project is a fork of PDFTK Builder by Angus Johnson that enhances the user interface, adds functions, and enables use of later versions of PDFtk. OS: Windows. Author: David King. License: GPLv3.

4 Reviews

Downloads: 191 This Week

Last Update: 2025-11-22
See Project
25

deepdoctection

A Repo For Document AI

DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...

Downloads: 3 This Week

Last Update: 2026-04-09
See Project