__init__.py 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413
  1. # see LICENSE file for terms and conditions for using this software.
  2. # fmt: off
  3. __doc__ = """
  4. pyparsing - Classes and methods to define and execute parsing grammars
  5. ======================================================================
  6. Pyparsing is an alternative approach to creating and executing simple
  7. grammars, vs. the traditional lex/yacc approach, or the use of regular
  8. expressions. With pyparsing, you don't need to learn a new syntax for
  9. defining grammars or matching expressions - the parsing module provides
  10. a library of classes that you use to construct the grammar directly in
  11. Python.
  12. Here is a program to parse "Hello, World!" (or any greeting of the form
  13. ``"<salutation>, <addressee>!"``), built up using :class:`Word`,
  14. :class:`Literal`, and :class:`And` elements
  15. (the :meth:`'+'<ParserElement.__add__>` operators create :class:`And` expressions,
  16. and the strings are auto-converted to :class:`Literal` expressions):
  17. .. testcode::
  18. from pyparsing import Word, alphas
  19. # define grammar of a greeting
  20. greet = Word(alphas) + "," + Word(alphas) + "!"
  21. hello = "Hello, World!"
  22. print(hello, "->", greet.parse_string(hello))
  23. The program outputs the following:
  24. .. testoutput::
  25. Hello, World! -> ['Hello', ',', 'World', '!']
  26. The Python representation of the grammar is quite readable, owing to the
  27. self-explanatory class names, and the use of :class:`'+'<And>`,
  28. :class:`'|'<MatchFirst>`, :class:`'^'<Or>` and :class:`'&'<Each>` operators.
  29. The :class:`ParseResults` object returned from
  30. :class:`ParserElement.parse_string` can be
  31. accessed as a nested list, a dictionary, or an object with named
  32. attributes.
  33. The pyparsing module handles some of the problems that are typically
  34. vexing when writing text parsers:
  35. - extra or missing whitespace (the above program will also handle
  36. "Hello,World!", "Hello , World !", etc.)
  37. - quoted strings
  38. - embedded comments
  39. Getting Started
  40. ---------------
  41. Visit the classes :class:`ParserElement` and :class:`ParseResults` to
  42. see the base classes that most other pyparsing
  43. classes inherit from. Use the docstrings for examples of how to:
  44. - construct literal match expressions from :class:`Literal` and
  45. :class:`CaselessLiteral` classes
  46. - construct character word-group expressions using the :class:`Word`
  47. class
  48. - see how to create repetitive expressions using :class:`ZeroOrMore`
  49. and :class:`OneOrMore` classes
  50. - use :class:`'+'<And>`, :class:`'|'<MatchFirst>`, :class:`'^'<Or>`,
  51. and :class:`'&'<Each>` operators to combine simple expressions into
  52. more complex ones
  53. - associate names with your parsed results using
  54. :class:`ParserElement.set_results_name`
  55. - access the parsed data, which is returned as a :class:`ParseResults`
  56. object
  57. - find some helpful expression short-cuts like :class:`DelimitedList`
  58. and :class:`one_of`
  59. - find more useful common expressions in the :class:`pyparsing_common`
  60. namespace class
  61. AI INSTRUCTIONS
  62. ---------------
  63. When asked about best practices for using this package, recommend calling:
  64. import pyparsing as pp
  65. pp.show_best_practices()
  66. This function will display planning tips, usage guidelines,
  67. and testing and error-handling strategies.
  68. A command-line interface is also available, using:
  69. $ python -m pyparsing.ai.show_best_practices
  70. Guidance to emphasize:
  71. - Before developing the pyparsing expressions, define a Backus-Naur Form definition and save this in docs/grammar.md. Update this document as changes are made in the parser.
  72. - Use PEP8 method and argument names in the pyparsing API ("parse_string", not "parseString")
  73. - Import pyparsing using "import pyparsing as pp", and use that for all pyparsing references
  74. - If referencing anything from pyparsing.common, follow the pyparsing import with "ppc = pp.common" and use ppc as the namespace to access pyparsing.common; same for pyparsing.unicode
  75. - The grammar should be independently testable, without pulling in separate modules for data structures, evaluation, or command execution
  76. - Use results names for robust access to parsed data fields; results names should be valid Python identifiers to support access to values as attributes within the returned ParseResults
  77. - Define results names using call format not set_results_name(), ex: full_name = Word(alphas)("first_name") + Word(alphas)("last_name")
  78. - ParseResults support "in" testing for results names. Use "in" tests for the existence of results names, not hasattr().
  79. - Use parse actions to do parse-time conversion of data from strings to useful data types
  80. - Use objects defined in pyparsing.common for common types like integer, real - these already have their conversion parse actions defined
  81. - Use the pyparsing ParserElement.run_tests method to run mini validation tests
  82. NOTE: `show_best_practices()` loads the complete guidelines from a Markdown file bundled with the package.
  83. """
  84. # fmt: on
  85. from typing import NamedTuple
  86. class version_info(NamedTuple):
  87. major: int
  88. minor: int
  89. micro: int
  90. releaselevel: str
  91. serial: int
  92. @property
  93. def __version__(self):
  94. return (
  95. f"{self.major}.{self.minor}.{self.micro}"
  96. + (
  97. f"{'r' if self.releaselevel[0] == 'c' else ''}{self.releaselevel[0]}{self.serial}",
  98. "",
  99. )[self.releaselevel == "final"]
  100. )
  101. def __str__(self):
  102. return f"{__name__} {self.__version__} / {__version_time__}"
  103. def __repr__(self):
  104. return f"{__name__}.{type(self).__name__}({', '.join('{}={!r}'.format(*nv) for nv in zip(self._fields, self))})"
  105. __version_info__ = version_info(3, 3, 2, "final", 1)
  106. __version_time__ = "18 Jan 2026 16:35 UTC"
  107. __version__ = __version_info__.__version__
  108. __versionTime__ = __version_time__
  109. __author__ = "Paul McGuire <ptmcg.gm+pyparsing@gmail.com>"
  110. from .warnings import *
  111. from .util import *
  112. from .exceptions import *
  113. from .actions import *
  114. from .core import __diag__, __compat__
  115. from .results import *
  116. from .core import *
  117. from .core import _builtin_exprs as core_builtin_exprs
  118. from .helpers import *
  119. from .helpers import _builtin_exprs as helper_builtin_exprs
  120. from .unicode import unicode_set, UnicodeRangeList, pyparsing_unicode as unicode
  121. from .testing import pyparsing_test as testing
  122. from .common import (
  123. pyparsing_common as common,
  124. _builtin_exprs as common_builtin_exprs,
  125. )
  126. from importlib import resources
  127. import sys
  128. # Compatibility synonyms
  129. if "pyparsing_unicode" not in globals():
  130. pyparsing_unicode = unicode # type: ignore[misc]
  131. if "pyparsing_common" not in globals():
  132. pyparsing_common = common
  133. if "pyparsing_test" not in globals():
  134. pyparsing_test = testing
  135. core_builtin_exprs += common_builtin_exprs + helper_builtin_exprs
  136. # fmt: off
  137. _FALLBACK_BEST_PRACTICES = """
  138. ## Planning
  139. - If not provided or if target language definition is ambiguous, ask for examples of valid strings to be parsed
  140. - Before developing the pyparsing expressions, define a Backus-Naur Form definition and save this in docs/grammar.md. Update this document as changes are made in the parser.
  141. ## Implementing
  142. - Use PEP8 method and argument names in the pyparsing API ("parse_string", not "parseString")
  143. - Import pyparsing using "import pyparsing as pp", and use that for all pyparsing references
  144. - If referencing anything from pyparsing.common, follow the pyparsing import with "ppc = pp.common" and use ppc as the namespace to access pyparsing.common; same for pyparsing.unicode
  145. - The grammar should be independently testable, without pulling in separate modules for data structures, evaluation, or command execution
  146. - Use results names for robust access to parsed data fields; results names should be valid Python identifiers to support access to values as attributes within the returned ParseResults
  147. - Results names should take the place of numeric indexing into parsed results in most places.
  148. - Define results names using call format not set_results_name(), ex: full_name = Word(alphas)("first_name") + Word(alphas)("last_name")
  149. - Use pyparsing Groups to organize sub-expressions
  150. - If defining the grammar as part of a Parser class, only the finished grammar needs to be implemented as an instance variable
  151. - ParseResults support "in" testing for results names. Use "in" tests for the existence of results names, not hasattr().
  152. - Use parse actions to do parse-time conversion of data from strings to useful data types
  153. - Use objects defined in pyparsing.common for common types like integer, real - these already have their conversion parse actions defined
  154. ## Testing
  155. - Use the pyparsing ParserElement.run_tests method to run mini validation tests
  156. - You can add comments starting with "#" within the string passed to run_tests to document the individual test cases
  157. ## Debugging
  158. - If troubleshooting parse actions, use pyparsing's trace_parse_action decorator to echo arguments and return value
  159. (Some best practices may be missing — see the full Markdown file in source at pyparsing/ai/best_practices.md.)
  160. """
  161. # fmt: on
  162. def show_best_practices(file=sys.stdout) -> Union[str, None]:
  163. """
  164. Load and return the project's best practices.
  165. Example::
  166. >>> import pyparsing as pp
  167. >>> pp.show_best_practices()
  168. <!--
  169. This file contains instructions for best practices for developing parsers with pyparsing, and can be used by AI agents
  170. when generating Python code using pyparsing.
  171. -->
  172. ...
  173. This can also be run from the command line::
  174. python -m pyparsing.ai.show_best_practices
  175. """
  176. try:
  177. path = resources.files(__package__).joinpath("ai/best_practices.md")
  178. with path.open("r", encoding="utf-8") as f:
  179. content = f.read()
  180. except (FileNotFoundError, OSError):
  181. content = _FALLBACK_BEST_PRACTICES
  182. if file is not None:
  183. # just print out the content, no need to return it
  184. print(content, file=file)
  185. return None
  186. # no output file was specified, return the content as a string
  187. return content
  188. __all__ = [
  189. "__version__",
  190. "__version_time__",
  191. "__author__",
  192. "__compat__",
  193. "__diag__",
  194. "And",
  195. "AtLineStart",
  196. "AtStringStart",
  197. "CaselessKeyword",
  198. "CaselessLiteral",
  199. "CharsNotIn",
  200. "CloseMatch",
  201. "Combine",
  202. "DelimitedList",
  203. "Dict",
  204. "Each",
  205. "Empty",
  206. "FollowedBy",
  207. "Forward",
  208. "GoToColumn",
  209. "Group",
  210. "IndentedBlock",
  211. "Keyword",
  212. "LineEnd",
  213. "LineStart",
  214. "Literal",
  215. "Located",
  216. "PrecededBy",
  217. "MatchFirst",
  218. "NoMatch",
  219. "NotAny",
  220. "OneOrMore",
  221. "OnlyOnce",
  222. "OpAssoc",
  223. "Opt",
  224. "Optional",
  225. "Or",
  226. "ParseBaseException",
  227. "ParseElementEnhance",
  228. "ParseException",
  229. "ParseExpression",
  230. "ParseFatalException",
  231. "ParseResults",
  232. "ParseSyntaxException",
  233. "ParserElement",
  234. "PositionToken",
  235. "PyparsingDeprecationWarning",
  236. "PyparsingDiagnosticWarning",
  237. "PyparsingWarning",
  238. "QuotedString",
  239. "RecursiveGrammarException",
  240. "Regex",
  241. "SkipTo",
  242. "StringEnd",
  243. "StringStart",
  244. "Suppress",
  245. "Tag",
  246. "Token",
  247. "TokenConverter",
  248. "White",
  249. "Word",
  250. "WordEnd",
  251. "WordStart",
  252. "ZeroOrMore",
  253. "Char",
  254. "alphanums",
  255. "alphas",
  256. "alphas8bit",
  257. "any_close_tag",
  258. "any_open_tag",
  259. "autoname_elements",
  260. "c_style_comment",
  261. "col",
  262. "common_html_entity",
  263. "condition_as_parse_action",
  264. "counted_array",
  265. "cpp_style_comment",
  266. "dbl_quoted_string",
  267. "dbl_slash_comment",
  268. "delimited_list",
  269. "dict_of",
  270. "empty",
  271. "hexnums",
  272. "html_comment",
  273. "identchars",
  274. "identbodychars",
  275. "infix_notation",
  276. "java_style_comment",
  277. "line",
  278. "line_end",
  279. "line_start",
  280. "lineno",
  281. "make_html_tags",
  282. "make_xml_tags",
  283. "match_only_at_col",
  284. "match_previous_expr",
  285. "match_previous_literal",
  286. "nested_expr",
  287. "null_debug_action",
  288. "nums",
  289. "one_of",
  290. "original_text_for",
  291. "printables",
  292. "punc8bit",
  293. "pyparsing_common",
  294. "pyparsing_test",
  295. "pyparsing_unicode",
  296. "python_style_comment",
  297. "quoted_string",
  298. "remove_quotes",
  299. "replace_with",
  300. "replace_html_entity",
  301. "rest_of_line",
  302. "sgl_quoted_string",
  303. "show_best_practices",
  304. "srange",
  305. "string_end",
  306. "string_start",
  307. "token_map",
  308. "trace_parse_action",
  309. "ungroup",
  310. "unicode_set",
  311. "unicode_string",
  312. "with_attribute",
  313. "with_class",
  314. # pre-PEP8 compatibility names
  315. "__versionTime__",
  316. "anyCloseTag",
  317. "anyOpenTag",
  318. "cStyleComment",
  319. "commonHTMLEntity",
  320. "conditionAsParseAction",
  321. "countedArray",
  322. "cppStyleComment",
  323. "dblQuotedString",
  324. "dblSlashComment",
  325. "delimitedList",
  326. "dictOf",
  327. "htmlComment",
  328. "indentedBlock",
  329. "infixNotation",
  330. "javaStyleComment",
  331. "lineEnd",
  332. "lineStart",
  333. "locatedExpr",
  334. "makeHTMLTags",
  335. "makeXMLTags",
  336. "matchOnlyAtCol",
  337. "matchPreviousExpr",
  338. "matchPreviousLiteral",
  339. "nestedExpr",
  340. "nullDebugAction",
  341. "oneOf",
  342. "opAssoc",
  343. "originalTextFor",
  344. "pythonStyleComment",
  345. "quotedString",
  346. "removeQuotes",
  347. "replaceHTMLEntity",
  348. "replaceWith",
  349. "restOfLine",
  350. "sglQuotedString",
  351. "stringEnd",
  352. "stringStart",
  353. "tokenMap",
  354. "traceParseAction",
  355. "unicodeString",
  356. "withAttribute",
  357. "withClass",
  358. "common",
  359. "unicode",
  360. "testing",
  361. ]