text parsing free download

Showing 237 open source projects for "text parsing"

View related business solutions

Enterprise AI Agents for Every Customer Moment
For enterprise companies looking for AI Agents

From chat to voice to SMS, every conversation gets a smart, personalized response powered by your policies, tone, and data.

Learn More
Secure your business by securing your people.
Over 100,000 businesses trust 1Password

Take the guesswork out of password management, shadow IT, infrastructure, and secret sharing so you can keep your people safe and your business moving.

Learn More
1

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

...Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction capabilities into a unified API that standardizes the output. The platform supports automated processing pipelines that detect file types and apply the appropriate extraction method to obtain the most accurate text representation possible. It can be integrated into document analysis systems, knowledge retrieval tools, and AI pipelines that rely on clean textual data. ...

Downloads: 3 This Week

Last Update: 2026-03-05
See Project
2

LiteParse

A fast, helpful, and open-source document parser

LiteParse is an open-source lightweight parsing library designed to extract structured data from unstructured text using large language models in an efficient and cost-effective manner. It focuses on simplifying the process of turning raw text into structured outputs such as JSON by providing a streamlined interface for prompt-based parsing. The system is designed to minimize overhead, making it suitable for applications where performance and cost are critical considerations. ...

Downloads: 8 This Week

Last Update: 4 days ago
See Project
3

YAML

JavaScript parser and stringifier for YAML

yaml is a definitive library for YAML, the human friendly data serialization standard. This library supports both YAML 1.1 and YAML 1.2 and all common data schemas, passes all of the yaml-test-suite tests. It can accept any string as input without throwing, parsing as much YAML out of it as it can, and supports parsing, modifying, and writing YAML comments and blank lines. The library is released under the ISC open source license, and the code is available on GitHub. It has no external...

Downloads: 12 This Week

Last Update: 2026-03-21
See Project
4

TextFSM

Python module for parsing semi-structured text into python tables

TextFSM is a Python library created by Google that provides a template-based state machine engine for parsing semi-structured text. It is particularly useful for extracting structured data from command-line interface (CLI) outputs, such as those from network devices, routers, and switches. By defining parsing logic through reusable template files, TextFSM transforms unstructured text into structured data like lists or tables without requiring complex regular expression code. ...

Downloads: 0 This Week

Last Update: 2025-10-11
See Project
Airlock Digital - Application Control (Allowlisting) Made Simple
Airlock Digital delivers an easy-to-manage and scalable application control solution to protect endpoints with confidence.

For organizations seeking the most effective way to prevent malware and ransomware in their environments. It has been designed to provide scalable, efficient endpoint security for organizations with even the most diverse architectures and rigorous compliance requirements. Built by practitioners for the world’s largest and most secure organizations, Airlock Digital delivers precision Application Control & Allowlisting for the modern enterprise.

Learn More
5

npm-pdfreader

Parse text and tables from PDF files.

npm-pdfreader is a Node.js library for reading text and parsing tables from PDF files. It supports tabular data with automatic column detection and rule-based parsing, making it useful for extracting structured data from PDFs.

Downloads: 6 This Week

Last Update: 2025-11-01
See Project
6

Ksoup

Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML

Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities.

Downloads: 5 This Week

Last Update: 2025-06-08
See Project
7

Markdig

A fast, powerful, CommonMark compliant, extensible Markdown processor

A fast, powerful, CommonMark compliant, extensible Markdown processor for .NET. Very fast parser and HTML renderer (no-regexp), very lightweight in terms of GC pressure. Abstract Syntax Tree with precise source code location for syntax tree, useful when building a Markdown editor. Check out MarkdownEditor for Visual Studio powered by Markdig! Even the core Markdown/CommonMark parsing is pluggable, so it allows to disable built-in Markdown/Commonmark parsing (e.g Disable HTML parsing) or...

Downloads: 6 This Week

Last Update: 2026-03-27
See Project
8

RAG Anything

RAG-Anything: All-in-One RAG Framework

...The system uses a multi-stage pipeline (e.g., document parsing, content analysis, knowledge graph construction, intelligent retrieval) so queries can navigate across modalities with deeper understanding and relevance.

Downloads: 4 This Week

Last Update: 2026-03-24
See Project
9

ChordSheetJS

A JavaScript library for parsing and formatting chords and chord sheet

ChordSheetJS is a JavaScript library for parsing, formatting, and transposing chord sheets. It supports various chord sheet formats and provides tools for rendering and manipulating chord and lyric data.

Downloads: 4 This Week

Last Update: 2026-04-06
See Project
More Bookings. Better Experience.
For tour and activity providers

The all-in-one solution built to help you stay organised and get more bookings with thousands of connections to online travel agencies (OTAs), resellers and suppliers.

Learn More
10

LlamaParse

Parse files for optimal RAG

LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.

Downloads: 2 This Week

Last Update: 2026-02-13
See Project
11

GROBID

A machine learning software for extracting information

GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as a side project since the beginning and is expected to continue as such. Header extraction and parsing from article in PDF format. The...

Downloads: 5 This Week

Last Update: 2026-04-07
See Project
12

markdown-it

Markdown parser, done right. 100% CommonMark support, extensions

markdown-it is a fast and extensible JavaScript-based Markdown parser designed to convert Markdown text into HTML while maintaining strict compliance with the CommonMark specification and offering additional syntax enhancements. It is widely used in web applications, documentation tools, and content platforms due to its high performance and flexibility. The library is built with a rule-based parsing system that allows developers to customize or replace syntax rules, making it adaptable to a wide variety of use cases. ...

Downloads: 0 This Week

Last Update: 2026-03-29
See Project
13

tree-sitter

An incremental parsing system for programming tools

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. General enough to parse any programming language. Fast enough to parse on every keystroke in a text editor. Robust enough to provide useful results even in the presence of syntax errors. Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application. ...

Downloads: 9 This Week

Last Update: 2026-03-31
See Project
14

Notion-to-MD

Convert notion pages, block and list of blocks to markdown

Notion-to-MD is a Node.js package that allows you to convert Notion pages to Markdown format.Convert notion pages, blocks, and list of blocks to markdown (supports nesting) using notion-sdk-js.

Downloads: 5 This Week

Last Update: 2025-07-19
See Project
15

Helix

A post-modern modal text editor

Helix is a modal (Kakoune/Vim‑inspired) terminal-based text editor written in Rust. It features modern modal editing, multiple selections, smart syntax highlighting, and built-in language server (LSP) integration leveraging tree‑sitter for fast, incremental parsing and code intelligence.

Downloads: 9 This Week

Last Update: 2025-07-31
See Project
16

zpdf

Zero-copy PDF text extraction library written in Zig

zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches.

Downloads: 0 This Week

Last Update: 2026-02-01
See Project
17

commonmark-java

Java library for parsing and rendering CommonMark (Markdown)

Java library for parsing and rendering Markdown text according to the CommonMark specification (and some extensions). Provides classes for parsing input to an abstract syntax tree of nodes (AST), visiting and manipulating nodes, and rendering to HTML. It started out as a port of commonmark.js, but has since evolved into a full library with a nice API.

Downloads: 0 This Week

Last Update: 2026-03-31
See Project
18

ELisp Tree-sitter

Tree-sitter bindings for Emacs Lisp

...The minor mode tree-sitter-mode provides a buffer-local syntax tree, which is kept up-to-date with changes to the buffer’s text. Run M-x tree-sitter-hl-mode to replace the regex-based highlighting provided by font-lock-mode with tree-based syntax highlighting.

Downloads: 3 This Week

Last Update: 2026-01-16
See Project
19

ANTLR

Parser generator to read, process, or translate structured text

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. It’s widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. ...

Downloads: 5 This Week

Last Update: 2024-08-03
See Project
20

py-pdf-parser

A Python tool to help extracting information from structured PDFs

py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents.

Downloads: 8 This Week

Last Update: 2025-04-28
See Project
21

amrlib

A python library that makes AMR parsing, generation and visualization

A python library that makes AMR parsing, generation and visualization simple. amrlib is a python module designed to make processing for Abstract Meaning Representation (AMR) simple by providing the following functions. Sentence to Graph (StoG) parsing to create AMR graphs from English sentences. Graph to Sentence (GtoS) generation for turning AMR graphs into English sentences. A QT-based GUI to facilitate the conversion of sentences to graphs and back to sentences. Methods to plot AMR graphs...

Downloads: 0 This Week

Last Update: 2026-03-07
See Project
22

Extractous

Fast and efficient unstructured data extraction

Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. ...

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
23

SemTools

Semantic search and document parsing tools for the command line

SemTools is an open-source command-line toolkit designed for document parsing, semantic indexing, and semantic search workflows. The project focuses on enabling developers and AI agents to process large document collections and extract meaningful semantic representations that can be searched efficiently. Built with Rust for performance and reliability, the toolchain provides fast processing of text and structured documents while maintaining low system overhead.

Downloads: 13 This Week

Last Update: 2026-03-13
See Project
24

mavonEditor

A markdown editor based on Vue

A markdown editor based on Vue that supports a variety of personalized features. The default toolbar properties are all true, You can customize the object to cover them. The language parsing files and code highlighting in Code Highlighting highlight.js will be loaded on demand. GitHub-markdown-CSS and katex will load only when mounted.

Downloads: 1 This Week

Last Update: 2025-03-05
See Project
25

dots.ocr

Multilingual Document Layout Parsing in a Single Vision-Language Model

dots.ocr is a cutting-edge multilingual document parsing system built on a unified vision-language model that combines layout detection, text recognition, and structural understanding into a single architecture. Unlike traditional OCR pipelines that rely on multiple specialized components, dots.ocr integrates these processes end-to-end, reducing error propagation and improving consistency across tasks.

Downloads: 0 This Week

Last Update: 2026-03-24
See Project