Skip to content

MrDebugger/jsoup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPI version PyPI downloads PyPI pyversions PyPI license GitHub stars GitHub issues GitHub last commit

jsoup

A Python library that converts JSON structures into BeautifulSoup HTML/XML trees. The inverse of bs2json — build HTML from dictionaries with full support for attributes, comments, doctypes, and nested elements.

Python 3.8+ | Only dependency: beautifulsoup4


Table of Contents
Section Description
Installation How to install
Quick Start Basic usage
Input Format How JSON maps to HTML
Features Attributes, lists, comments, empty elements, doctypes
bs2json Roundtrip Using bs2json output as jsoup input
Options Custom labels, duplicate attributes, char refs
API Reference JsonTreeBuilder, install()
Contributing How to contribute

Installation

pip install -U jsoup

Quick Start

from jsoup import JsonTreeBuilder
from bs4 import BeautifulSoup

json = {
    "body": {
        "h1": {"attrs": {"class": "title"}, "text": "Hello World"},
        "p": "This is a paragraph.",
        "br": None,
        "ul": {
            "li": ["Item 1", "Item 2", "Item 3"]
        }
    }
}

soup = BeautifulSoup(json, builder=JsonTreeBuilder)
print(soup.prettify())

Output:

<body>
 <h1 class="title">
  Hello World
 </h1>
 <p>
  This is a paragraph.
 </p>
 <br/>
 <ul>
  <li>Item 1</li>
  <li>Item 2</li>
  <li>Item 3</li>
 </ul>
</body>

Input Format

JSON HTML
{"p": "text"} <p>text</p>
{"br": None} <br/>
{"p": {"attrs": {"class": "x"}, "text": "hello"}} <p class="x">hello</p>
{"li": ["a", "b", "c"]} <li>a</li><li>b</li><li>c</li>
{"comment": "note"} <!--note-->
{"doctype": "html"} <!DOCTYPE html>
{"div": {"children": [{"p": "a"}, {"p": "b"}]}} <div><p>a</p><p>b</p></div>

Features

Attributes

Attributes are passed via the attrs key:

json = {
    "a": {"attrs": {"href": "/home", "class": "nav"}, "text": "Home"},
    "img": {"attrs": {"src": "photo.jpg", "alt": "Photo"}}
}

Produces:

<a class="nav" href="/home">Home</a>
<img alt="Photo" src="photo.jpg"/>
Lists (Multiple Same Tags)

A list value creates multiple tags with the same name:

json = {"ul": {"li": ["Apple", "Banana", "Cherry"]}}

Produces:

<ul><li>Apple</li><li>Banana</li><li>Cherry</li></ul>

List items can also be dicts with nested content:

json = {"ul": {"li": [
    "Simple item",
    {"text": "Item with link", "a": {"attrs": {"href": "/"}, "text": "click"}}
]}}
Comments
json = {
    "body": {
        "comment": "This is a comment",
        "p": "Visible text"
    }
}
# Produces: <!--This is a comment--><p>Visible text</p>
Empty Elements

Use None for self-closing tags:

json = {"body": {"br": None, "hr": None}}
# Produces: <body><br/><hr/></body>
Doctypes
json = {
    "doctype": "html",
    "html": {"body": {"p": "content"}}
}
Nested Structures

Nesting works naturally:

json = {
    "html": {
        "head": {"title": "My Page"},
        "body": {
            "header": {
                "nav": {"ul": {"li": [
                    {"a": {"attrs": {"href": "/"}, "text": "Home"}},
                    {"a": {"attrs": {"href": "/about"}, "text": "About"}}
                ]}}
            },
            "main": {"h1": "Welcome", "p": "Content here"},
            "footer": {"p": "Copyright 2026"}
        }
    }
}

bs2json Roundtrip

jsoup understands the children key from bs2json's ordered output, enabling roundtrip conversion:

from bs2json import BS2Json
from bs4 import BeautifulSoup
from jsoup import JsonTreeBuilder

# HTML -> JSON (bs2json)
html = "<html><body><h1>Title</h1><p>Text</p><h1>Another</h1></body></html>"
json_data = BS2Json(html).convert()
# {'html': {'body': {'children': [{'h1': 'Title'}, {'p': 'Text'}, {'h1': 'Another'}]}}}

# JSON -> HTML (jsoup)
soup = BeautifulSoup(json_data, builder=JsonTreeBuilder)
print(soup.prettify())
# <html><body><h1>Title</h1><p>Text</p><h1>Another</h1></body></html>

The children key preserves element order, including elements with attributes:

json = {
    "table": {
        "attrs": {"id": "data"},
        "children": [
            {"tr": {"children": [{"th": "Name"}, {"th": "Score"}]}},
            {"tr": {"children": [{"td": "Alice"}, {"td": "95"}]}}
        ]
    }
}

Options

Using install() for Cleaner Syntax

Register jsoup so you can use "jsoup" as a parser string:

from jsoup import install
install()

from bs4 import BeautifulSoup
soup = BeautifulSoup({"p": "hello"}, "jsoup")
Custom Label Names

Override the default key names for attributes, text, and children:

json = {"p": {"@": {"class": "x"}, "#text": "hello"}}
soup = BeautifulSoup(json, builder=JsonTreeBuilder,
                     attr_name='@', text_name='#text')
# <p class="x">hello</p>
Duplicate Attributes

Control how duplicate attribute keys are handled when attrs is a list of dicts:

json = {"p": {"attrs": [{"class": "a"}, {"class": "b"}], "text": "hello"}}

# Replace (default): last value wins
soup = BeautifulSoup(json, builder=JsonTreeBuilder, on_duplicate_attribute="replace")

# Ignore: first value wins
soup = BeautifulSoup(json, builder=JsonTreeBuilder, on_duplicate_attribute="ignore")

# Callable: custom merge logic
def merge(attrs, name, value):
    attrs[name] += " " + value

soup = BeautifulSoup(json, builder=JsonTreeBuilder, on_duplicate_attribute=merge)
Character References

HTML entities are escaped automatically:

json = {"p": "1<2 && 2>1"}
soup = BeautifulSoup(json, builder=JsonTreeBuilder)
# <p>1&lt;2 &amp;&amp; 2&gt;1</p>

API Reference

JsonTreeBuilder

A BeautifulSoup TreeBuilder that accepts JSON dicts as input.

from jsoup import JsonTreeBuilder
soup = BeautifulSoup(json_data, builder=JsonTreeBuilder, **options)

Options (passed as kwargs to BeautifulSoup):

Option Default Description
attr_name "attrs" JSON key for element attributes
text_name "text" JSON key for text content
children_name "children" JSON key for ordered children list
on_duplicate_attribute "replace" How to handle duplicate attrs: "replace", "ignore", or callable
convert_charref True Whether to escape HTML entities
install()

Register JsonTreeBuilder so "jsoup" can be used as a parser string:

from jsoup import install
install(debug=False)

After calling install():

soup = BeautifulSoup(json_data, "jsoup")

Contributing

See CONTRIBUTING.md for development setup, versioning guide, and how to submit changes.

About

A library to convert JSON data into Beautiful Soup objects, allowing easy parsing and manipulation of HTML-like structures using the Beautiful Soup API.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages