METADATA 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404
  1. Metadata-Version: 2.4
  2. Name: json_repair
  3. Version: 0.58.6
  4. Summary: A package to repair broken json strings
  5. Author-email: Stefano Baccianella <4247706+mangiucugna@users.noreply.github.com>
  6. License-Expression: MIT
  7. Project-URL: Homepage, https://github.com/mangiucugna/json_repair/
  8. Project-URL: Bug Tracker, https://github.com/mangiucugna/json_repair/issues
  9. Project-URL: Live demo, https://mangiucugna.github.io/json_repair/
  10. Keywords: JSON,REPAIR,LLM,PARSER
  11. Classifier: Programming Language :: Python :: 3
  12. Classifier: Operating System :: OS Independent
  13. Requires-Python: >=3.10
  14. Description-Content-Type: text/markdown
  15. License-File: LICENSE
  16. Provides-Extra: schema
  17. Requires-Dist: jsonschema>=4.21; extra == "schema"
  18. Requires-Dist: pydantic>=2; extra == "schema"
  19. Dynamic: license-file
  20. [![PyPI](https://img.shields.io/pypi/v/json-repair)](https://pypi.org/project/json-repair/)
  21. ![Python version](https://img.shields.io/badge/python-3.10+-important)
  22. [![PyPI downloads](https://img.shields.io/pypi/dm/json-repair)](https://pypi.org/project/json-repair/)
  23. [![PyPI Downloads](https://static.pepy.tech/badge/json-repair)](https://pepy.tech/projects/json-repair)
  24. [![Github Sponsors](https://img.shields.io/github/sponsors/mangiucugna)](https://github.com/sponsors/mangiucugna)
  25. [![GitHub Repo stars](https://img.shields.io/github/stars/mangiucugna/json_repair?style=flat)](https://github.com/mangiucugna/json_repair/stargazers)
  26. English | [中文](README.zh.md)
  27. This simple package can be used to fix an invalid json string. To know all cases in which this package will work, check out the unit test.
  28. ![banner](banner.png)
  29. ---
  30. # Think about sponsoring this library!
  31. This library is free for everyone and it's maintained and developed as a side project so, if you find this library useful for your work, consider becoming a sponsor via this link: https://github.com/sponsors/mangiucugna
  32. ## Premium sponsors
  33. - [Icana-AI](https://github.com/Icana-AI) Makers of CallCoach, the world's best Call Centre AI Coach. Visit [https://www.icana.ai/](https://www.icana.ai/)
  34. - [mjharte](https://github.com/mjharte)
  35. ---
  36. # Demo
  37. If you are unsure if this library will fix your specific problem, or simply want your json validated online, you can visit the demo site on GitHub pages: https://mangiucugna.github.io/json_repair/
  38. Or hear an [audio deepdive generate by Google's NotebookLM](https://notebooklm.google.com/notebook/05312bb3-f6f3-4e49-a99b-bd51db64520b/audio) for an introduction to the module
  39. ---
  40. # Motivation
  41. Some LLMs are a bit iffy when it comes to returning well formed JSON data, sometimes they skip a parentheses and sometimes they add some words in it, because that's what an LLM does.
  42. Luckily, the mistakes LLMs make are simple enough to be fixed without destroying the content.
  43. I searched for a lightweight python package that was able to reliably fix this problem but couldn't find any.
  44. *So I wrote one*
  45. # Supported use cases
  46. ### Fixing Syntax Errors in JSON
  47. - Missing quotes, misplaced commas, unescaped characters, and incomplete key-value pairs.
  48. - Missing quotation marks, improperly formatted values (true, false, null), and repairs corrupted key-value structures.
  49. ### Repairing Malformed JSON Arrays and Objects
  50. - Incomplete or broken arrays/objects by adding necessary elements (e.g., commas, brackets) or default values (null, "").
  51. - The library can process JSON that includes extra non-JSON characters like comments or improperly placed characters, cleaning them up while maintaining valid structure.
  52. ### Auto-Completion for Missing JSON Values
  53. - Automatically completes missing values in JSON fields with reasonable defaults (like empty strings or null), ensuring validity.
  54. # How to use
  55. Install the library with pip
  56. pip install json-repair
  57. then you can use use it in your code like this
  58. from json_repair import repair_json
  59. good_json_string = repair_json(bad_json_string)
  60. # If the string was super broken this will return an empty string
  61. You can use this library to completely replace `json.loads()`:
  62. import json_repair
  63. decoded_object = json_repair.loads(json_string)
  64. or just
  65. import json_repair
  66. decoded_object = json_repair.repair_json(json_string, return_objects=True)
  67. ### Avoid this antipattern
  68. Some users of this library adopt the following pattern:
  69. obj = {}
  70. try:
  71. obj = json.loads(string)
  72. except json.JSONDecodeError as e:
  73. obj = json_repair.loads(string)
  74. ...
  75. This is wasteful because `json_repair` will already verify for you if the JSON is valid, if you still want to do that then add `skip_json_loads=True` to the call as explained the section below.
  76. ### Read json from a file or file descriptor
  77. JSON repair provides also a drop-in replacement for `json.load()`:
  78. import json_repair
  79. try:
  80. file_descriptor = open(fname, 'rb')
  81. except OSError:
  82. ...
  83. with file_descriptor:
  84. decoded_object = json_repair.load(file_descriptor)
  85. and another method to read from a file:
  86. import json_repair
  87. try:
  88. decoded_object = json_repair.from_file(json_file)
  89. except OSError:
  90. ...
  91. except IOError:
  92. ...
  93. Keep in mind that the library will not catch any IO-related exception and those will need to be managed by you
  94. ### Non-Latin characters
  95. When working with non-Latin characters (such as Chinese, Japanese, or Korean), you need to pass `ensure_ascii=False` to `repair_json()` in order to preserve the non-Latin characters in the output.
  96. Here's an example using Chinese characters:
  97. repair_json("{'test_chinese_ascii':'统一码'}")
  98. will return
  99. {"test_chinese_ascii": "\u7edf\u4e00\u7801"}
  100. Instead passing `ensure_ascii=False`:
  101. repair_json("{'test_chinese_ascii':'统一码'}", ensure_ascii=False)
  102. will return
  103. {"test_chinese_ascii": "统一码"}
  104. ### JSON dumps parameters
  105. More in general, `repair_json` will accept all parameters that `json.dumps` accepts and just pass them through (for example indent)
  106. ### Performance considerations
  107. If you find this library too slow because is using `json.loads()` you can skip that by passing `skip_json_loads=True` to `repair_json`. Like:
  108. from json_repair import repair_json
  109. good_json_string = repair_json(bad_json_string, skip_json_loads=True)
  110. I made a choice of not using any fast json library to avoid having any external dependency, so that anybody can use it regardless of their stack.
  111. Some rules of thumb to use:
  112. - Setting `return_objects=True` will always be faster because the parser returns an object already and it doesn't have serialize that object to JSON
  113. - `skip_json_loads` is faster only if you 100% know that the string is not a valid JSON
  114. - If you are having issues with escaping pass the string as **raw** string like: `r"string with escaping\""`
  115. ### Strict mode
  116. By default `json_repair` does its best to “fix” input, even when the JSON is far from valid.
  117. In some scenarios you want the opposite behavior and need the parser to error out instead of repairing; pass `strict=True` to `repair_json`, `loads`, `load`, or `from_file` to enable that mode:
  118. ```
  119. from json_repair import repair_json
  120. repair_json(bad_json_string, strict=True)
  121. ```
  122. The CLI exposes the same behavior with `json_repair --strict input.json` (or piping data via stdin).
  123. In strict mode the parser raises `ValueError` as soon as it encounters structural issues such as duplicate keys, missing `:` separators, empty keys/values introduced by stray commas, multiple top-level elements, or other ambiguous constructs. This is useful when you just need validation with friendlier error messages while still benefiting from json_repair’s resilience elsewhere in your stack.
  124. Strict mode still honors `skip_json_loads=True`; combining them lets you skip the initial `json.loads` check but still enforce strict parsing rules.
  125. ### Schema-guided repairs
  126. Schema-guided repairs are currently considered in beta. Bugs are to be expected.
  127. You can guide repairs with a JSON Schema (or a Pydantic v2 model). When enabled, the parser will:
  128. - Fill missing values (defaults, required fields).
  129. - Coerce scalars where safe (e.g., `"1"` → `1` for integer fields, and `"yes"`/`"no"`/`1`/`0` for booleans).
  130. - Drop properties/items that the schema disallows.
  131. Schema mode can be selected with `schema_repair_mode`:
  132. - `standard` (default): existing schema-guided behavior.
  133. - `salvage`: includes `standard` and also:
  134. - drops invalid array items when individual items cannot be repaired;
  135. - maps arrays to objects by property order when schema/object shape is unambiguous.
  136. - unwraps a root single-item array to an object when the root schema expects an object (`[{...}] -> {...}`);
  137. - fills missing required fields only when a safe value can be inferred (`default`, `const`, first `enum`, or empty array/object when allowed by schema constraints).
  138. This is especially useful when you need deterministic, schema-valid outputs for downstream validation, storage, or typed processing. If the input cannot be repaired to satisfy the schema, `json_repair` raises `ValueError`.
  139. Install the optional dependencies:
  140. pip install 'json-repair[schema]'
  141. (For CLI usage, you can also use `pipx install 'json-repair[schema]'`.)
  142. When `schema` is provided, schema guidance is always applied (for both valid and invalid JSON). Schema guidance is mutually exclusive with `strict=True`.
  143. ```
  144. from json_repair import repair_json
  145. schema = {
  146. "type": "object",
  147. "properties": {"value": {"type": "integer"}},
  148. "required": ["value"],
  149. }
  150. repair_json('{"value": "1"}', schema=schema, return_objects=True)
  151. repair_json(
  152. '{"items":[{"id":1,"score":85.6},{"id":2,"score":"N/A"}]}',
  153. schema={
  154. "type": "object",
  155. "properties": {
  156. "items": {
  157. "type": "array",
  158. "items": {
  159. "type": "object",
  160. "properties": {"id": {"type": "integer"}, "score": {"type": "number"}},
  161. "required": ["id", "score"],
  162. },
  163. }
  164. },
  165. "required": ["items"],
  166. },
  167. schema_repair_mode="salvage",
  168. return_objects=True,
  169. )
  170. ```
  171. Pydantic v2 model example:
  172. ```
  173. from pydantic import BaseModel, Field
  174. from json_repair import repair_json
  175. class Payload(BaseModel):
  176. value: int
  177. tags: list[str] = Field(default_factory=list)
  178. repair_json(
  179. '{"value": "1", "tags": }',
  180. schema=Payload,
  181. skip_json_loads=True,
  182. return_objects=True,
  183. )
  184. ```
  185. ### Use json_repair with streaming
  186. Sometimes you are streaming some data and want to repair the JSON coming from it. Normally this won't work but you can pass `stream_stable` to `repair_json()` or `loads()` to make it work:
  187. ```
  188. stream_output = repair_json(stream_input, stream_stable=True)
  189. ```
  190. ### Use json_repair from CLI
  191. Install the library for command-line with:
  192. ```
  193. pipx install json-repair
  194. ```
  195. to know all options available:
  196. ```
  197. $ json_repair -h
  198. usage: json_repair [-h] [-i] [-o TARGET] [--ensure_ascii] [--indent INDENT]
  199. [--skip-json-loads] [--schema SCHEMA] [--schema-model MODEL]
  200. [--strict] [--schema-repair-mode {standard,salvage}] [filename]
  201. Repair and parse JSON files.
  202. positional arguments:
  203. filename The JSON file to repair (if omitted, reads from stdin)
  204. options:
  205. -h, --help show this help message and exit
  206. -i, --inline Replace the file inline instead of returning the output to stdout
  207. -o TARGET, --output TARGET
  208. If specified, the output will be written to TARGET filename instead of stdout
  209. --ensure_ascii Pass ensure_ascii=True to json.dumps()
  210. --indent INDENT Number of spaces for indentation (Default 2)
  211. --skip-json-loads Skip initial json.loads validation
  212. --schema SCHEMA Path to a JSON Schema file that guides repairs
  213. --schema-model MODEL Pydantic v2 model in 'module:ClassName' form that guides repairs
  214. --strict Raise on duplicate keys, missing separators, empty keys/values, and similar structural issues instead of repairing them
  215. --schema-repair-mode {standard,salvage}
  216. Schema repair mode: standard (default) or salvage (best-effort array/object salvage)
  217. ```
  218. ## Adding to requirements
  219. **Please pin this library only on the major version!**
  220. We use TDD and strict semantic versioning, there will be frequent updates and no breaking changes in minor and patch versions.
  221. To ensure that you only pin the major version of this library in your `requirements.txt`, specify the package name followed by the major version and a wildcard for minor and patch versions. For example:
  222. json_repair==0.*
  223. In this example, any version that starts with `0.` will be acceptable, allowing for updates on minor and patch versions.
  224. ---
  225. # How to cite
  226. If you are using this library in your academic work (as I know many folks are) please find the BibTex here:
  227. @software{Baccianella_JSON_Repair_-_2025,
  228. author = "Stefano {Baccianella}",
  229. month = "feb",
  230. title = "JSON Repair - A python module to repair invalid JSON, commonly used to parse the output of LLMs",
  231. url = "https://github.com/mangiucugna/json_repair",
  232. version = "0.39.1",
  233. year = 2025
  234. }
  235. Thank you for citing my work and please send me a link to the paper if you can!
  236. ---
  237. # How it works
  238. This module will parse the JSON file following the BNF definition:
  239. <json> ::= <primitive> | <container>
  240. <primitive> ::= <number> | <string> | <boolean>
  241. ; Where:
  242. ; <number> is a valid real number expressed in one of a number of given formats
  243. ; <string> is a string of valid characters enclosed in quotes
  244. ; <boolean> is one of the literal strings 'true', 'false', or 'null' (unquoted)
  245. <container> ::= <object> | <array>
  246. <array> ::= '[' [ <json> *(', ' <json>) ] ']' ; A sequence of JSON values separated by commas
  247. <object> ::= '{' [ <member> *(', ' <member>) ] '}' ; A sequence of 'members'
  248. <member> ::= <string> ': ' <json> ; A pair consisting of a name, and a JSON value
  249. If something is wrong (a missing parentheses or quotes for example) it will use a few simple heuristics to fix the JSON string:
  250. - Add the missing parentheses if the parser believes that the array or object should be closed
  251. - Quote strings or add missing single quotes
  252. - Adjust whitespaces and remove line breaks
  253. I am sure some corner cases will be missing, if you have examples please open an issue or even better push a PR
  254. # Contributing
  255. If you want to contribute, start with `CONTRIBUTING.md` and read the Code Wiki writeup for a tour of the codebase and key entry points: https://codewiki.google/github.com/mangiucugna/json_repair
  256. # How to develop
  257. Use `uv` to set up the dev environment and run tooling:
  258. uv sync --group dev
  259. uv run pre-commit run --all-files
  260. uv run pytest
  261. Make sure that the Github Actions running after pushing a new commit don't fail as well.
  262. # How to release
  263. You will need owner access to this repository
  264. - Edit `pyproject.toml` and update the version number appropriately using `semver` notation
  265. - **Commit and push all changes to the repository before continuing or the next steps will fail**
  266. - Run `python -m build`
  267. - Create a new release in Github, making sure to tag all the issues solved and contributors. Create the new tag, same as the one in the build configuration
  268. - Once the release is created, a new Github Actions workflow will start to publish on Pypi, make sure it didn't fail
  269. ## Docs demo API deployment (PythonAnywhere)
  270. - The docs site is deployed by GitHub Pages (`pages-build-deployment`).
  271. - After a successful Pages deployment on `main`, `.github/workflows/pythonanywhere-sync.yml` uploads `docs/app.py` to PythonAnywhere at `/home/mangiucugna/json_repair/app.py` and reloads `mangiucugna.pythonanywhere.com`.
  272. - Required repository Actions secret: PythonAnywhere API token (`PYTHONANYWHERE_API_TOKEN`).
  273. ---
  274. # Repair JSON in other programming languages
  275. - Typescript: https://github.com/josdejong/jsonrepair
  276. - Go: https://github.com/RealAlexandreAI/json-repair
  277. - Ruby: https://github.com/sashazykov/json-repair-rb
  278. - Rust: https://github.com/oramasearch/llm_json
  279. - R: https://github.com/cgxjdzz/jsonRepair
  280. - Java: https://github.com/du00cs/json-repairj
  281. ---
  282. ## Star History
  283. [![Star History Chart](https://api.star-history.com/svg?repos=mangiucugna/json_repair&type=Date)](https://star-history.com/#mangiucugna/json_repair&Date)