Procházet zdrojové kódy

小红书笔记完美

yichael před 2 měsíci
rodič
revize
696829b99d
100 změnil soubory, kde provedl 5873 přidání a 5551 odebrání
  1. 27 0
      3.0.0
  2. 12 7
      nodejs/adb/adb-interact.js
  3. 113 0
      nodejs/adb/test-adbkeyboard-send.js
  4. 1 2
      nodejs/ef-compiler/actions/adb/click.js
  5. 273 5
      nodejs/ef-compiler/actions/adb/input.js
  6. 5 1
      nodejs/ef-compiler/actions/echo-parser.js
  7. 6 2
      nodejs/ef-compiler/actions/fun/fun-parser.js
  8. 16 10
      nodejs/ef-compiler/actions/fun/ocr.js
  9. 0 4
      python/LightGlue/.flake8
  10. 0 1
      python/LightGlue/.gitattributes
  11. 0 166
      python/LightGlue/.gitignore
  12. 0 201
      python/LightGlue/LICENSE
  13. 0 183
      python/LightGlue/README.md
  14. binární
      python/LightGlue/assets/DSC_0410.JPG
  15. binární
      python/LightGlue/assets/DSC_0411.JPG
  16. 0 718
      python/LightGlue/assets/architecture.svg
  17. binární
      python/LightGlue/assets/benchmark.png
  18. binární
      python/LightGlue/assets/benchmark_cpu.png
  19. binární
      python/LightGlue/assets/easy_hard.jpg
  20. binární
      python/LightGlue/assets/sacre_coeur1.jpg
  21. binární
      python/LightGlue/assets/sacre_coeur2.jpg
  22. 0 1499
      python/LightGlue/assets/teaser.svg
  23. 0 255
      python/LightGlue/benchmark.py
  24. 0 77
      python/LightGlue/demo.ipynb
  25. 0 7
      python/LightGlue/lightglue/__init__.py
  26. 0 775
      python/LightGlue/lightglue/aliked.py
  27. 0 55
      python/LightGlue/lightglue/disk.py
  28. 0 41
      python/LightGlue/lightglue/dog_hardnet.py
  29. 0 667
      python/LightGlue/lightglue/lightglue.py
  30. 0 216
      python/LightGlue/lightglue/sift.py
  31. 0 227
      python/LightGlue/lightglue/superpoint.py
  32. 0 165
      python/LightGlue/lightglue/utils.py
  33. 0 203
      python/LightGlue/lightglue/viz2d.py
  34. 0 30
      python/LightGlue/pyproject.toml
  35. 19 0
      python/RapidOCR/.gitattributes
  36. 13 0
      python/RapidOCR/.github/FUNDING.yml
  37. 21 0
      python/RapidOCR/.github/ISSUE_TEMPLATE/bug.md
  38. 11 0
      python/RapidOCR/.github/ISSUE_TEMPLATE/config.yml
  39. 35 0
      python/RapidOCR/.github/workflows/SyncToGitee.yml
  40. 78 0
      python/RapidOCR/.github/workflows/gen_whl_to_pypi_rapidocr.yml
  41. 41 0
      python/RapidOCR/.github/workflows/push_discord.yml
  42. 32 33
      python/RapidOCR/.gitignore
  43. 18 0
      python/RapidOCR/.pre-commit-config.yaml
  44. 3 1
      python/RapidOCR/LICENSE
  45. 182 0
      python/RapidOCR/README-CN.md
  46. 179 0
      python/RapidOCR/README.md
  47. 1 0
      python/RapidOCR/android/README.md
  48. 3 0
      python/RapidOCR/api/README.md
  49. 191 0
      python/RapidOCR/assets/RapidOCRDemo.ipynb
  50. binární
      python/RapidOCR/assets/RapidOCR_LOGO.png
  51. 0 0
      python/RapidOCR/assets/colab-badge.svg
  52. 96 0
      python/RapidOCR/cliff.toml
  53. 3 0
      python/RapidOCR/cpp/README.md
  54. 25 0
      python/RapidOCR/docker/README.md
  55. 2 0
      python/RapidOCR/docker/docker_build&run.sh
  56. 3 0
      python/RapidOCR/docker/docker_stop&clean.sh
  57. 8 0
      python/RapidOCR/docker/dockerfile
  58. 250 0
      python/RapidOCR/docs/CONTRIBUTING-CN.md
  59. 250 0
      python/RapidOCR/docs/CONTRIBUTING.md
  60. 1 0
      python/RapidOCR/docs/doc_whl_rapidocr.md
  61. 1 0
      python/RapidOCR/dotnet/README.md
  62. 5 0
      python/RapidOCR/ios/README.md
  63. 3 0
      python/RapidOCR/jvm/README.md
  64. 1 0
      python/RapidOCR/ocrweb/README.md
  65. 1 0
      python/RapidOCR/python/README.md
  66. 12 0
      python/RapidOCR/python/demo.py
  67. 19 0
      python/RapidOCR/python/rapidocr/__init__.py
  68. 6 0
      python/RapidOCR/python/rapidocr/cal_rec_boxes/__init__.py
  69. 335 0
      python/RapidOCR/python/rapidocr/cal_rec_boxes/main.py
  70. 7 0
      python/RapidOCR/python/rapidocr/ch_ppocr_cls/__init__.py
  71. 98 0
      python/RapidOCR/python/rapidocr/ch_ppocr_cls/main.py
  72. 79 0
      python/RapidOCR/python/rapidocr/ch_ppocr_cls/utils.py
  73. 7 0
      python/RapidOCR/python/rapidocr/ch_ppocr_det/__init__.py
  74. 104 0
      python/RapidOCR/python/rapidocr/ch_ppocr_det/main.py
  75. 319 0
      python/RapidOCR/python/rapidocr/ch_ppocr_det/utils.py
  76. 7 0
      python/RapidOCR/python/rapidocr/ch_ppocr_rec/__init__.py
  77. 174 0
      python/RapidOCR/python/rapidocr/ch_ppocr_rec/main.py
  78. 77 0
      python/RapidOCR/python/rapidocr/ch_ppocr_rec/typings.py
  79. 186 0
      python/RapidOCR/python/rapidocr/ch_ppocr_rec/utils.py
  80. 26 0
      python/RapidOCR/python/rapidocr/cli.py
  81. 177 0
      python/RapidOCR/python/rapidocr/config.yaml
  82. 704 0
      python/RapidOCR/python/rapidocr/default_models.yaml
  83. 3 0
      python/RapidOCR/python/rapidocr/inference_engine/__init__.py
  84. 162 0
      python/RapidOCR/python/rapidocr/inference_engine/base.py
  85. 6 0
      python/RapidOCR/python/rapidocr/inference_engine/mnn/__init__.py
  86. 83 0
      python/RapidOCR/python/rapidocr/inference_engine/mnn/main.py
  87. 6 0
      python/RapidOCR/python/rapidocr/inference_engine/onnxruntime/__init__.py
  88. 111 0
      python/RapidOCR/python/rapidocr/inference_engine/onnxruntime/main.py
  89. 203 0
      python/RapidOCR/python/rapidocr/inference_engine/onnxruntime/provider_config.py
  90. 6 0
      python/RapidOCR/python/rapidocr/inference_engine/openvino/__init__.py
  91. 51 0
      python/RapidOCR/python/rapidocr/inference_engine/openvino/device_config.py
  92. 103 0
      python/RapidOCR/python/rapidocr/inference_engine/openvino/main.py
  93. 6 0
      python/RapidOCR/python/rapidocr/inference_engine/paddle/__init__.py
  94. 120 0
      python/RapidOCR/python/rapidocr/inference_engine/paddle/device_config.py
  95. 150 0
      python/RapidOCR/python/rapidocr/inference_engine/paddle/main.py
  96. 6 0
      python/RapidOCR/python/rapidocr/inference_engine/pytorch/__init__.py
  97. 100 0
      python/RapidOCR/python/rapidocr/inference_engine/pytorch/device_config.py
  98. 37 0
      python/RapidOCR/python/rapidocr/inference_engine/pytorch/main.py
  99. 3 0
      python/RapidOCR/python/rapidocr/inference_engine/pytorch/networks/__init__.py
  100. 451 0
      python/RapidOCR/python/rapidocr/inference_engine/pytorch/networks/arch_config.yaml

+ 27 - 0
3.0.0

@@ -0,0 +1,27 @@
+Collecting paddlepaddle
+  Downloading paddlepaddle-3.3.0-cp312-cp312-win_amd64.whl.metadata (8.8 kB)
+Requirement already satisfied: httpx in .\python\x64\env\Lib\site-packages (from paddlepaddle) (0.28.1)
+Requirement already satisfied: numpy>=1.21 in .\python\x64\env\Lib\site-packages (from paddlepaddle) (2.4.1)
+Requirement already satisfied: protobuf>=3.20.2 in .\python\x64\env\Lib\site-packages (from paddlepaddle) (6.33.4)
+Requirement already satisfied: Pillow in .\python\x64\env\Lib\site-packages (from paddlepaddle) (12.1.1)
+Collecting opt-einsum==3.3.0 (from paddlepaddle)
+  Downloading opt_einsum-3.3.0-py3-none-any.whl.metadata (6.5 kB)
+Collecting networkx (from paddlepaddle)
+  Using cached networkx-3.6.1-py3-none-any.whl.metadata (6.8 kB)
+Requirement already satisfied: typing-extensions in .\python\x64\env\Lib\site-packages (from paddlepaddle) (4.15.0)
+Collecting safetensors>=0.6.0 (from paddlepaddle)
+  Downloading safetensors-0.7.0-cp38-abi3-win_amd64.whl.metadata (4.2 kB)
+Requirement already satisfied: setuptools in .\python\x64\env\Lib\site-packages (from paddlepaddle) (80.10.2)
+Requirement already satisfied: anyio in .\python\x64\env\Lib\site-packages (from httpx->paddlepaddle) (4.12.1)
+Requirement already satisfied: certifi in .\python\x64\env\Lib\site-packages (from httpx->paddlepaddle) (2026.2.25)
+Requirement already satisfied: httpcore==1.* in .\python\x64\env\Lib\site-packages (from httpx->paddlepaddle) (1.0.9)
+Requirement already satisfied: idna in .\python\x64\env\Lib\site-packages (from httpx->paddlepaddle) (3.11)
+Requirement already satisfied: h11>=0.16 in .\python\x64\env\Lib\site-packages (from httpcore==1.*->httpx->paddlepaddle) (0.16.0)
+Downloading paddlepaddle-3.3.0-cp312-cp312-win_amd64.whl (104.3 MB)
+   ---------------------------------------- 104.3/104.3 MB 7.1 MB/s  0:00:14
+Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
+Downloading safetensors-0.7.0-cp38-abi3-win_amd64.whl (341 kB)
+Using cached networkx-3.6.1-py3-none-any.whl (2.1 MB)
+Installing collected packages: safetensors, opt-einsum, networkx, paddlepaddle
+
+Successfully installed networkx-3.6.1 opt-einsum-3.3.0 paddlepaddle-3.3.0 safetensors-0.7.0

+ 12 - 7
nodejs/adb/adb-interact.js

@@ -1,5 +1,5 @@
 #!/usr/bin/env node
-const { execSync } = require('child_process')
+const { execSync, spawnSync } = require('child_process')
 const path = require('path')
 const fs = require('fs')
 
@@ -33,14 +33,14 @@ function runShellWithOutput(adbPath, deviceFlag, shellArgs, timeout = 1000) {
   return execSync(`${quoted} ${deviceFlag}shell ${shellArgs}`, { encoding: 'utf-8', timeout })
 }
 
-/** 处理 tap:点击坐标 */
+/** 处理 tap:点击坐标(超时放宽以便 WiFi 连接时能完成) */
 function handleTap(adbPath, argv) {
   const coordX = argv[3]
   const coordY = argv[4]
   const deviceId = argv[5] || ''
   if (!coordX || !coordY) process.exit(1)
   const flag = getDeviceFlag(deviceId)
-  runShell(adbPath, flag, `input tap ${coordX} ${coordY}`, 1000)
+  runShell(adbPath, flag, `input tap ${coordX} ${coordY}`, 8000)
 }
 
 /** 根据方向与距离计算滑动的起止坐标(基于屏幕中心),返回 [startX, startY, endX, endY] */
@@ -98,14 +98,19 @@ function handleKeyevent(adbPath, argv) {
   runShell(adbPath, flag, `input keyevent ${keyCode}`, 1000)
 }
 
-/** 处理 text:输入文本 */
+/** 简单 input text(供 type:input 等调用;adb method:input 的完整逻辑在 ef-compiler/actions/adb/input.js) */
 function handleText(adbPath, argv) {
   const text = argv[3]
   const deviceId = argv[4] || ''
   if (text === undefined) process.exit(1)
-  const flag = getDeviceFlag(deviceId)
-  const escaped = String(text).replace(/"/g, '\\"').replace(/\$/g, '\\$').replace(/`/g, '\\`')
-  runShell(adbPath, flag, `input text "${escaped}"`, 5000)
+  const str = String(text)
+  const shellArgs = deviceId && deviceId.includes(':') ? ['-s', deviceId, 'shell', 'input', 'text', str] : ['shell', 'input', 'text', str]
+  const r = spawnSync(adbPath, shellArgs, { encoding: 'utf-8', timeout: 5000 })
+  if (r.status !== 0) {
+    const err = new Error(r.stderr || r.stdout || 'input text failed')
+    err.status = r.status
+    throw err
+  }
 }
 
 const action = process.argv[2]

+ 113 - 0
nodejs/adb/test-adbkeyboard-send.js

@@ -0,0 +1,113 @@
+#!/usr/bin/env node
+/**
+ * 测试 ADB Keyboard 发送文字:尝试多种 broadcast 方式直到成功。
+ * 用法: node test-adbkeyboard-send.js <device_id> [文本]
+ * 例:   node test-adbkeyboard-send.js 192.168.42.129:5555 创建小红书笔记
+ */
+const path = require('path')
+const fs = require('fs')
+const { spawnSync } = require('child_process')
+
+const projectRoot = path.resolve(__dirname, '..', '..')
+const config = require(path.join(projectRoot, 'configs', 'config.js'))
+const adbPath = config.adbPath?.path ? path.resolve(projectRoot, config.adbPath.path) : path.join(projectRoot, 'lib', 'scrcpy-adb', 'adb.exe')
+const PKG = 'com.android.adbkeyboard'
+const TEST_TEXT = process.argv[3] || '测试'
+
+function deviceArgs(device) {
+  return device && String(device).includes(':') ? ['-s', device] : []
+}
+
+function shell(device, cmd, timeout = 5000) {
+  const args = [...deviceArgs(device), 'shell', cmd]
+  const r = spawnSync(adbPath, args, { encoding: 'utf-8', timeout })
+  return { ok: r.status === 0, out: (r.stdout || '').trim(), err: (r.stderr || '').trim() }
+}
+
+function broadcast(device, action, extraKey, value, component) {
+  const args = [...deviceArgs(device), 'shell', 'am', 'broadcast', '-a', action, '--es', extraKey, value]
+  if (component) args.splice(args.length - 2, 0, '-n', component)
+  const r = spawnSync(adbPath, args, { encoding: 'utf-8', timeout: 10000 })
+  return r.status === 0
+}
+
+function getReceivers(device) {
+  const { out } = shell(device, `dumpsys package ${PKG}`, 8000)
+  const receivers = []
+  const re = /Receiver #\d+.*?ComponentInfo\{([^}]+)\}/gs
+  let m
+  while ((m = re.exec(out)) !== null) {
+    const comp = m[1].trim()
+    if (comp.startsWith(PKG)) receivers.push(comp)
+  }
+  const alt = out.match(new RegExp(PKG + '/[^\\s\\n\\r]+', 'g'))
+  if (alt) alt.forEach(c => { if (!receivers.includes(c)) receivers.push(c) })
+  return [...new Set(receivers)]
+}
+
+function setIme(device, imeId) {
+  return shell(device, `ime set ${imeId}`, 5000).ok
+}
+
+function getCurrentIme(device) {
+  return shell(device, 'settings get secure default_input_method', 3000).out
+}
+
+function findAdbKeyboardIme(device) {
+  const { out } = shell(device, 'ime list -a', 5000)
+  const lines = (out || '').split(/\r?\n/)
+  for (const line of lines) {
+    const t = line.trim().replace(/:$/, '')
+    if (t.includes('/') && t.toLowerCase().includes('adbkeyboard')) return t
+  }
+  return `${PKG}/.AdbIME`
+}
+
+function main() {
+  const device = process.argv[2]
+  if (!device) {
+    console.error('用法: node test-adbkeyboard-send.js <device_id> [文本]')
+    process.exit(2)
+  }
+
+  const prevIme = getCurrentIme(device)
+  const imeId = findAdbKeyboardIme(device)
+  console.log('当前 IME:', prevIme)
+  console.log('切换到:', imeId)
+  if (!setIme(device, imeId)) {
+    console.error('切换 IME 失败')
+    process.exit(1)
+  }
+
+  const base64 = Buffer.from(TEST_TEXT, 'utf8').toString('base64')
+  const methods = [
+    { name: 'ADB_INPUT_B64 隐式', fn: () => broadcast(device, 'ADB_INPUT_B64', 'msg', base64, null) },
+    { name: `ADB_INPUT_B64 -n ${PKG}/.AdbReceiver`, fn: () => broadcast(device, 'ADB_INPUT_B64', 'msg', base64, `${PKG}/.AdbReceiver`) },
+    { name: `ADB_INPUT_B64 -n ${PKG}/.Receiver`, fn: () => broadcast(device, 'ADB_INPUT_B64', 'msg', base64, `${PKG}/.Receiver`) },
+    { name: `ADB_INPUT_B64 -n ${imeId}`, fn: () => broadcast(device, 'ADB_INPUT_B64', 'msg', base64, imeId) },
+    { name: 'ADB_INPUT_TEXT 隐式', fn: () => broadcast(device, 'ADB_INPUT_TEXT', 'msg', TEST_TEXT, null) },
+    { name: `ADB_INPUT_TEXT -n ${PKG}/.AdbReceiver`, fn: () => broadcast(device, 'ADB_INPUT_TEXT', 'msg', TEST_TEXT, `${PKG}/.AdbReceiver`) },
+  ]
+
+  const receivers = getReceivers(device)
+  console.log('包内 Receiver 组件:', receivers.length ? receivers : '(未解析到)')
+  receivers.forEach(comp => {
+    methods.push({ name: `ADB_INPUT_B64 -n ${comp}`, fn: () => broadcast(device, 'ADB_INPUT_B64', 'msg', base64, comp) })
+  })
+
+  for (const m of methods) {
+    const ok = m.fn()
+    console.log(ok ? '[OK]' : '[FAIL]', m.name)
+    if (ok) {
+      if (prevIme) setIme(device, prevIme)
+      console.log('发送成功,已切回原 IME。请查看设备输入框是否出现文字:', TEST_TEXT)
+      process.exit(0)
+    }
+  }
+
+  if (prevIme) setIme(device, prevIme)
+  console.error('所有方式均失败')
+  process.exit(1)
+}
+
+main()

+ 1 - 2
nodejs/ef-compiler/actions/adb/click.js

@@ -14,9 +14,8 @@ async function run(action, ctx) {
     else position = variableContext[extractVarName(raw)]
   }
   if (!position && action.target) position = resolveValue(action.target, variableContext)
-  if (!position) return { success: false, error: 'click 操作缺少位置参数' }
+  if (position == null || position === '') return { success: false, error: 'click 操作缺少位置参数(位置变量为空或匹配未找到)' }
   if (typeof position === 'string') {
-    if (position === '') return { success: false, error: 'click 操作缺少位置参数(位置变量为空)' }
     try {
       position = JSON.parse(position)
     } catch (e) {

+ 273 - 5
nodejs/ef-compiler/actions/adb/input.js

@@ -1,13 +1,235 @@
 /**
- * adb method: input — 输入文本(支持 clear 清空
+ * adb method: input — 通过 ADB Keyboard 输入文本(先安装 static/ADBKeyboard.apk,切换 IME,发送后切回原输入法
  */
+const path = require('path')
+const fs = require('fs')
+const { execSync, spawnSync } = require('child_process')
+
+const ADB_KEYBOARD_IME_ID = 'com.android.adbkeyboard/.AdbKeyboard'
+const ADB_KEYBOARD_PACKAGE = 'com.android.adbkeyboard'
+const B64_CHUNK_CHARS = 200
+
+function getProjectRoot(ctx) {
+  const root = ctx.compilerConfig?.projectRoot
+  if (root && fs.existsSync(root)) return root
+  const defaultRoot = path.resolve(__dirname, '..', '..', '..', '..')
+  return defaultRoot
+}
+
+function getAdbPath(projectRoot) {
+  try {
+    const configPath = path.join(projectRoot, 'configs', 'config.js')
+    const config = fs.existsSync(configPath) ? require(configPath) : {}
+    const p = config.adbPath?.path
+    if (p) return path.isAbsolute(p) ? p : path.resolve(projectRoot, p)
+  } catch (e) {}
+  return path.join(projectRoot, 'lib', 'scrcpy-adb', process.platform === 'win32' ? 'adb.exe' : 'adb')
+}
+
+function getAdbKeyboardApkPath(projectRoot) {
+  const p = path.join(projectRoot, 'static', 'ADBKeyboard.apk')
+  return fs.existsSync(p) ? p : null
+}
+
+function deviceArgs(device) {
+  return device && String(device).includes(':') ? ['-s', device] : []
+}
+
+function runShell(adbPath, device, shellArgs, timeout = 3000) {
+  const args = [...deviceArgs(device), 'shell', shellArgs]
+  const r = spawnSync(adbPath, args, { encoding: 'utf-8', timeout })
+  if (r.status !== 0) throw new Error(r.stderr || r.stdout || `adb shell failed: ${r.status}`)
+  return (r.stdout || '').trim()
+}
+
+function runShellQuiet(adbPath, device, shellArgs, timeout = 3000) {
+  const args = [...deviceArgs(device), 'shell', shellArgs]
+  const r = spawnSync(adbPath, args, { encoding: 'utf-8', timeout })
+  return r.status === 0
+}
+
+function isPackageInstalled(adbPath, device, packageId) {
+  try {
+    const out = runShell(adbPath, device, `pm list packages ${packageId}`, 3000)
+    return (out || '').indexOf('package:' + packageId) >= 0
+  } catch (e) {
+    return false
+  }
+}
+
+function installApk(adbPath, device, apkPath) {
+  const args = [...deviceArgs(device), 'install', '-r', apkPath]
+  const r = spawnSync(adbPath, args, { encoding: 'utf-8', timeout: 60000 })
+  if (r.status !== 0) throw new Error(r.stderr || r.stdout || `adb install failed: ${r.status}`)
+}
+
+function getCurrentIme(adbPath, device) {
+  try {
+    return runShell(adbPath, device, 'settings get secure default_input_method', 3000)
+  } catch (e) {
+    return ''
+  }
+}
+
+function listEnabledImes(adbPath, device) {
+  try {
+    const out = runShell(adbPath, device, 'ime list -s', 3000)
+    return (out || '').split(/\r?\n/).map(s => s.trim()).filter(Boolean)
+  } catch (e) {
+    return []
+  }
+}
+
+function listAllImes(adbPath, device) {
+  try {
+    const out = runShell(adbPath, device, 'ime list -a', 3000)
+    return (out || '').split(/\r?\n/).map(s => s.trim()).filter(Boolean)
+  } catch (e) {
+    return []
+  }
+}
+
+/** 从 ime list -a 输出解析 IME id 列表(行格式多为 "com.pkg/.Class:") */
+function parseImeIdsFromListA(out) {
+  const ids = []
+  const lines = (out || '').split(/\r?\n/)
+  for (const line of lines) {
+    const t = line.trim()
+    if (t.includes('/') && (t.endsWith(':') || !t.includes(' '))) {
+      ids.push(t.replace(/:$/, ''))
+    }
+  }
+  return ids
+}
+
+function findAdbKeyboardImeId(adbPath, device) {
+  try {
+    const out = runShell(adbPath, device, 'ime list -a', 3000)
+    const ids = parseImeIdsFromListA(out)
+    const lower = s => String(s).toLowerCase()
+    const found = ids.find(id => lower(id).includes('adbkeyboard') || lower(id).includes('adb.keyboard'))
+    return found || ADB_KEYBOARD_IME_ID
+  } catch (e) {
+    return ADB_KEYBOARD_IME_ID
+  }
+}
+
+/** 从 dumpsys package 解析包内 BroadcastReceiver 组件名,用于 -n 显式发送 */
+function getPackageReceivers(adbPath, device, packageId) {
+  try {
+    const out = runShell(adbPath, device, `dumpsys package ${packageId}`, 8000)
+    const receivers = []
+    const re = /ComponentInfo\{([^}]+)\}/g
+    let m
+    while ((m = re.exec(out)) !== null) {
+      const comp = m[1].trim()
+      if (comp.startsWith(packageId + '/')) receivers.push(comp)
+    }
+    return [...new Set(receivers)]
+  } catch (e) {
+    return []
+  }
+}
+
+function enableIme(adbPath, device, imeId) {
+  if (!imeId) return
+  const args = [...deviceArgs(device), 'shell', 'ime', 'enable', imeId]
+  const r = spawnSync(adbPath, args, { encoding: 'utf-8', timeout: 5000 })
+  if (r.status !== 0 && r.stderr) {
+    throw new Error(`ime enable 失败: ${(r.stderr || r.stdout || '').trim() || r.status}`)
+  }
+}
+
+function setIme(adbPath, device, imeId) {
+  if (!imeId) return
+  const args = [...deviceArgs(device), 'shell', 'ime', 'set', imeId]
+  const r = spawnSync(adbPath, args, { encoding: 'utf-8', timeout: 5000 })
+  if (r.status !== 0) {
+    throw new Error(`ime set 失败: ${(r.stderr || r.stdout || '').trim() || r.status}`)
+  }
+}
+
+function ensureAdbKeyboardInstalled(adbPath, device, projectRoot) {
+  const apkPath = getAdbKeyboardApkPath(projectRoot)
+  if (!apkPath) throw new Error('未找到 static/ADBKeyboard.apk,请将 ADB Keyboard 的 apk 放在该路径')
+  if (!isPackageInstalled(adbPath, device, ADB_KEYBOARD_PACKAGE)) installApk(adbPath, device, apkPath)
+}
+
+/** 单次 broadcast(--es 字符串),返回是否成功 */
+function tryBroadcast(adbPath, device, action, key, value, component) {
+  const args = [...deviceArgs(device), 'shell', 'am', 'broadcast', '-a', action, '--es', key, value]
+  if (component) args.splice(args.indexOf(action) + 1, 0, '-n', component)
+  const r = spawnSync(adbPath, args, { encoding: 'utf-8', timeout: 15000 })
+  return r.status === 0
+}
+
+/** ADB_INPUT_CHARS:--eia chars 码点1 码点2 ...(部分设备 B64 异常时可用) */
+function tryBroadcastChars(adbPath, device, codePoints, component) {
+  const args = [...deviceArgs(device), 'shell', 'am', 'broadcast', '-a', 'ADB_INPUT_CHARS', '--eia', 'chars', ...codePoints.map(String)]
+  if (component) args.splice(args.indexOf('ADB_INPUT_CHARS') + 1, 0, '-n', component)
+  const r = spawnSync(adbPath, args, { encoding: 'utf-8', timeout: 15000 })
+  return r.status === 0
+}
+
+/** 依次尝试多种方式发送,直到一种成功。先试隐式(系统派发到注册的 Receiver),再试 -n */
+async function sendTextViaBroadcast(adbPath, device, str, adbKeyboardImeId, logMessage, folderPath) {
+  const fromDumpsys = getPackageReceivers(adbPath, device, ADB_KEYBOARD_PACKAGE)
+  const components = [
+    null,
+    adbKeyboardImeId,
+    ...fromDumpsys,
+    `${ADB_KEYBOARD_PACKAGE}/.AdbReceiver`,
+    `${ADB_KEYBOARD_PACKAGE}/.Receiver`
+  ]
+  const dedup = [...new Set(components)]
+  for (let i = 0; i < str.length; i += B64_CHUNK_CHARS) {
+    const chunk = str.slice(i, i + B64_CHUNK_CHARS)
+    const b64 = Buffer.from(chunk, 'utf8').toString('base64')
+    let sent = false
+    let used = null
+    for (const comp of dedup) {
+      if (tryBroadcast(adbPath, device, 'ADB_INPUT_B64', 'msg', b64, comp)) {
+        sent = true
+        used = comp ? comp : 'B64隐式'
+        break
+      }
+    }
+    if (!sent && chunk.length <= 100) {
+      const codePoints = [...chunk].map(c => c.codePointAt(0))
+      for (const comp of [adbKeyboardImeId, null]) {
+        if (tryBroadcastChars(adbPath, device, codePoints, comp)) {
+          sent = true
+          used = 'ADB_INPUT_CHARS'
+          break
+        }
+      }
+    }
+    if (!sent && tryBroadcast(adbPath, device, 'ADB_INPUT_TEXT', 'msg', chunk, null)) {
+      sent = true
+      used = 'ADB_INPUT_TEXT'
+    }
+    if (!sent) throw new Error('broadcast 所有方式均失败')
+    if (i + B64_CHUNK_CHARS < str.length) await new Promise(r => setTimeout(r, 150))
+  }
+}
+
 async function run(action, ctx) {
   const { device, folderPath, variableContext, api, extractVarName, resolveValue } = ctx
+  let inputValue = null
   const inVars = action.inVars || []
-  let inputValue = inVars.length > 0 ? variableContext[extractVarName(inVars[0])] : null
+  if (inVars.length > 0) {
+    const raw = inVars[0]
+    if (raw != null && typeof raw === 'string' && !(raw.startsWith('{') && raw.endsWith('}'))) {
+      inputValue = raw
+    } else {
+      inputValue = variableContext[extractVarName(raw)]
+    }
+  }
   if (!inputValue && action.value) inputValue = resolveValue(action.value, variableContext)
   if (!inputValue) return { success: false, error: 'input 操作缺少输入内容' }
+
   if (action.clear) {
+    if (!api?.sendKeyEvent) return { success: false, error: '清空需要 sendKeyEvent API' }
     for (let i = 0; i < 200; i++) {
       const clearResult = await api.sendKeyEvent(device, '67')
       if (!clearResult.success) break
@@ -15,9 +237,55 @@ async function run(action, ctx) {
     }
     await new Promise((r) => setTimeout(r, 200))
   }
-  if (!api?.sendText) return { success: false, error: '输入 API 不可用' }
-  const textResult = await api.sendText(device, String(inputValue))
-  if (!textResult.success) return { success: false, error: `输入失败: ${textResult.error != null ? textResult.error : 'unknown'}` }
+
+  const projectRoot = getProjectRoot(ctx)
+  const adbPath = getAdbPath(projectRoot)
+  if (!fs.existsSync(adbPath)) return { success: false, error: `未找到 adb: ${adbPath}` }
+
+  const logMessage = ctx.logMessage
+  const needInstall = !isPackageInstalled(adbPath, device, ADB_KEYBOARD_PACKAGE)
+  if (needInstall) {
+    ensureAdbKeyboardInstalled(adbPath, device, projectRoot)
+  }
+
+  const adbKeyboardImeId = findAdbKeyboardImeId(adbPath, device)
+
+  try {
+    const prevIme = getCurrentIme(adbPath, device)
+    try {
+      try {
+        enableIme(adbPath, device, adbKeyboardImeId)
+      } catch (enableErr) {}
+      await new Promise(r => setTimeout(r, 400))
+      try {
+        const sizeOut = runShell(adbPath, device, 'wm size', 3000)
+        const m = (sizeOut || '').match(/(\d+)x(\d+)/)
+        if (m) {
+          const w = parseInt(m[1], 10)
+          const h = parseInt(m[2], 10)
+          const tapX = Math.floor(w / 2)
+          const tapY = Math.floor(h * 0.28)
+          runShellQuiet(adbPath, device, `input tap ${tapX} ${tapY}`, 2000)
+          await new Promise(r => setTimeout(r, 500))
+        }
+      } catch (e) {}
+      setIme(adbPath, device, adbKeyboardImeId)
+      await new Promise(r => setTimeout(r, 1000))
+      await sendTextViaBroadcast(adbPath, device, String(inputValue), adbKeyboardImeId, logMessage, folderPath)
+      await new Promise(r => setTimeout(r, 1000))
+    } finally {
+      if (prevIme && prevIme !== adbKeyboardImeId) {
+        setIme(adbPath, device, prevIme)
+      } else {
+        const enabled = listEnabledImes(adbPath, device)
+        const other = enabled.find(id => id !== adbKeyboardImeId)
+        if (other) setIme(adbPath, device, other)
+      }
+    }
+  } catch (e) {
+    const msg = e && (e.message || String(e)) || 'unknown'
+    return { success: false, error: `输入失败: ${msg}` }
+  }
   return { success: true }
 }
 

+ 5 - 1
nodejs/ef-compiler/actions/echo-parser.js

@@ -37,7 +37,11 @@ async function execute(action, ctx) {
   const { folderPath, variableContext, replaceVariablesInString, logMessage } = ctx
   let message = ''
   if (action.inVars && action.inVars.length > 0) {
-    message = action.inVars.map((v) => (v != null ? String(v) : '')).join(' ')
+    message = action.inVars.map((v) => {
+      if (v == null) return ''
+      if (typeof v === 'object' && v !== null) return JSON.stringify(v)
+      return String(v)
+    }).join(' ')
   } else if (action.value) {
     message = replaceVariablesInString(action.value, variableContext)
     const doubleBracePattern = /\{\{([\w-]+)\}\}/g

+ 6 - 2
nodejs/ef-compiler/actions/fun/fun-parser.js

@@ -196,6 +196,7 @@ async function runAction(action, device, folderPath, resolution, ctx) {
       folderPath,
       resolution,
       variableContext,
+      compilerConfig: ctx.compilerConfig,
       api: ctx.electronAPI,
       extractVarName: ctx.extractVarName,
       resolveValue: ctx.resolveValue,
@@ -336,8 +337,11 @@ async function run(actionType, action, ctx, device, folderPath) {
       if (!result.success) return { success: false, error: `图像中心点定位失败: ${result.error}` }
       const outputVarName = action.outVars?.[0] != null ? extractVarName(String(action.outVars[0]).trim()) : (action.variable ? extractVarName(action.variable) : null)
       if (outputVarName) {
-        variableContext[outputVarName] = result.center && typeof result.center === 'object' && result.center.x !== undefined && result.center.y !== undefined
-          ? JSON.stringify({ x: result.center.x, y: result.center.y }) : ''
+        if (result.center && typeof result.center === 'object' && result.center.x !== undefined && result.center.y !== undefined) {
+          variableContext[outputVarName] = { x: Math.round(Number(result.center.x)), y: Math.round(Number(result.center.y)) }
+        } else {
+          variableContext[outputVarName] = null
+        }
         await logOutVars(action, variableContext, folderPath)
       }
       return { success: true, result: result.center }

+ 16 - 10
nodejs/ef-compiler/actions/fun/ocr.js

@@ -1,5 +1,5 @@
 /**
- * fun 标签:ocr(OnnxOCR 识别
+ * fun 标签:ocr(RapidOCR 识别,脚本为 python/scripts/ocr.py
  * 1)image 为图片路径时:对图片做 OCR,识别全文写入变量。
  * 2)image 为要查找的文字时:对设备截图做 OCR,在图中查找该文字,返回中心点坐标写入变量(需有设备)。
  */
@@ -15,7 +15,7 @@ const configPath = process.env.STATIC_ROOT
   : path.join(__dirname, '..', '..', '..', '..', 'configs', 'config.js')
 const projectRoot = path.dirname(path.dirname(path.resolve(configPath)))
 const config = fs.existsSync(configPath) ? require(configPath) : {}
-const ocrScriptPath = path.join(projectRoot, 'python', 'scripts', 'ocr-onnx.py')
+const ocrScriptPath = path.join(projectRoot, 'python', 'scripts', 'ocr.py')
 
 const tagName = 'ocr'
 
@@ -37,7 +37,7 @@ function getPythonPath() {
 }
 
 /**
- * 对指定图片执行 OnnxOCR 识别
+ * 对指定图片执行 RapidOCR 识别
  * @param {{ imagePath: string, folderPath?: string }} input - imagePath 图片路径(已解析后的相对或绝对路径), folderPath 流程目录
  * @returns {{ success: boolean, text?: string, error?: string }}
  */
@@ -51,7 +51,7 @@ async function executeOcr({ imagePath, folderPath }) {
   const resolvedImage = isAbsoluteOrDrive ? imagePath : (hasSubPath ? path.join(baseDir, imagePath) : path.join(baseDir, 'resources', imagePath))
 
   if (!fs.existsSync(ocrScriptPath)) {
-    return { success: false, error: `OCR 脚本不存在: ${ocrScriptPath}` }
+    return { success: false, error: `OCR 脚本不存在: ${ocrScriptPath}(请确保 python/scripts/ocr.py 存在)` }
   }
   if (!fs.existsSync(resolvedImage)) {
     return { success: false, error: `图片不存在: ${resolvedImage}` }
@@ -67,18 +67,21 @@ async function executeOcr({ imagePath, folderPath }) {
 
   const outStr = (r.stdout || '').trim()
   const errStr = (r.stderr || '').trim()
-  if (r.status !== 0) {
-    return { success: false, error: errStr || outStr || 'OCR 执行失败' }
-  }
   let out
   try {
     out = JSON.parse(outStr)
   } catch (e) {
+    if (r.status !== 0) {
+      return { success: false, error: errStr || outStr || 'OCR 执行失败' }
+    }
     return { success: false, error: `OCR 输出解析失败: ${outStr}` }
   }
   if (!out.success) {
     return { success: false, error: out.error || 'OCR 识别失败' }
   }
+  if (r.status !== 0) {
+    return { success: false, error: out.error || errStr || outStr || 'OCR 执行失败' }
+  }
   return { success: true, text: out.text != null ? String(out.text) : '' }
 }
 
@@ -106,18 +109,21 @@ async function executeOcrFindText({ device, findText, folderPath }) {
     })
     const outStr = (r.stdout || '').trim()
     const errStr = (r.stderr || '').trim()
-    if (r.status !== 0) {
-      return { success: false, error: errStr || outStr || 'OCR 查找文字失败' }
-    }
     let out
     try {
       out = JSON.parse(outStr)
     } catch (e) {
+      if (r.status !== 0) {
+        return { success: false, error: errStr || outStr || 'OCR 查找文字失败' }
+      }
       return { success: false, error: `OCR 输出解析失败: ${outStr}` }
     }
     if (!out.success || out.x == null || out.y == null) {
       return { success: false, error: out.error || '图中未找到该文字' }
     }
+    if (r.status !== 0) {
+      return { success: false, error: out.error || errStr || outStr || 'OCR 查找文字失败' }
+    }
     return { success: true, center: { x: out.x, y: out.y } }
   } finally {
     try { fs.unlinkSync(screenshotPath) } catch (_) {}

+ 0 - 4
python/LightGlue/.flake8

@@ -1,4 +0,0 @@
-[flake8]
-max-line-length = 88
-extend-ignore = E203
-exclude = .git,__pycache__,build,.venv/

+ 0 - 1
python/LightGlue/.gitattributes

@@ -1 +0,0 @@
-*.ipynb linguist-documentation

+ 0 - 166
python/LightGlue/.gitignore

@@ -1,166 +0,0 @@
-/data/
-/outputs/
-/lightglue/weights/
-*-checkpoint.ipynb
-*.pth
-
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
-
-# C extensions
-*.so
-
-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-
-# PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-#   in version control.
-#   https://pdm.fming.dev/#use-with-ide
-.pdm.toml
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-
-# PyCharm
-#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-#  and can be added to the global gitignore or merged into this file.  For a more nuclear
-#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-.idea/

+ 0 - 201
python/LightGlue/LICENSE

@@ -1,201 +0,0 @@
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
-   1. Definitions.
-
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-
-   END OF TERMS AND CONDITIONS
-
-   APPENDIX: How to apply the Apache License to your work.
-
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-
-   Copyright 2023 ETH Zurich
-
-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.

+ 0 - 183
python/LightGlue/README.md

@@ -1,183 +0,0 @@
-<p align="center">
-  <h1 align="center"><ins>LightGlue</ins> ⚡️<br>Local Feature Matching at Light Speed</h1>
-  <p align="center">
-    <a href="https://www.linkedin.com/in/philipplindenberger/">Philipp Lindenberger</a>
-    ·
-    <a href="https://psarlin.com/">Paul-Edouard&nbsp;Sarlin</a>
-    ·
-    <a href="https://www.microsoft.com/en-us/research/people/mapoll/">Marc&nbsp;Pollefeys</a>
-  </p>
-  <h2 align="center">
-    <p>ICCV 2023</p>
-    <a href="https://arxiv.org/pdf/2306.13643.pdf" align="center">Paper</a> | 
-    <a href="https://colab.research.google.com/github/cvg/LightGlue/blob/main/demo.ipynb" align="center">Colab</a> | 
-    <a href="https://huggingface.co/spaces/ETH-CVG/LightGlue" align="center">🤗 Demo </a> | 
-    <a href="https://psarlin.com/doc/LightGlue_ICCV2023_poster_compressed.pdf" align="center">Poster</a> | 
-    <a href="https://github.com/cvg/glue-factory" align="center"> ⚙️ Train your own</a>
-  </h2>
-
-</p>
-<p align="center">
-    <a href="https://arxiv.org/abs/2306.13643"><img src="assets/easy_hard.jpg" alt="example" width=80%></a>
-    <br>
-    <em>LightGlue is a deep neural network that matches sparse local features across image pairs.<br>An adaptive mechanism makes it fast for easy pairs (top) and reduces the computational complexity for difficult ones (bottom).</em>
-</p>
-
-##
-
-This repository hosts the inference code of LightGlue, a lightweight feature matcher with high accuracy and blazing fast inference. It takes as input a set of keypoints and descriptors for each image and returns the indices of corresponding points. The architecture is based on adaptive pruning techniques, in both network width and depth - [check out the paper for more details](https://arxiv.org/pdf/2306.13643.pdf).
-
-We release pretrained weights of LightGlue with [SuperPoint](https://arxiv.org/abs/1712.07629), [DISK](https://arxiv.org/abs/2006.13566), [ALIKED](https://arxiv.org/abs/2304.03608) and [SIFT](https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf) local features.
-The training and evaluation code can be found in our library [glue-factory](https://github.com/cvg/glue-factory/).
-
-LightGlue is now part of 🤗 [Hugging Face Transformers](https://huggingface.co/docs/transformers/main/en/model_doc/lightglue) (credit to [@sbucaille](https://huggingface.co/stevenbucaille)!). It enables easy inference in a few lines of Python code, using `pip install transformers` ([model card](https://huggingface.co/ETH-CVG/lightglue_superpoint)).
-
-## Installation and demo [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cvg/LightGlue/blob/main/demo.ipynb) [![](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/ETH-CVG/LightGlue) 
-
-Install this repo using pip:
-
-```bash
-git clone https://github.com/cvg/LightGlue.git && cd LightGlue
-python -m pip install -e .
-```
-
-We provide a [demo notebook](demo.ipynb) which shows how to perform feature extraction and matching on an image pair.
-
-Here is a minimal script to match two images:
-
-```python
-from lightglue import LightGlue, SuperPoint, DISK, SIFT, ALIKED, DoGHardNet
-from lightglue.utils import load_image, rbd
-
-# SuperPoint+LightGlue
-extractor = SuperPoint(max_num_keypoints=2048).eval().cuda()  # load the extractor
-matcher = LightGlue(features='superpoint').eval().cuda()  # load the matcher
-
-# or DISK+LightGlue, ALIKED+LightGlue or SIFT+LightGlue
-extractor = DISK(max_num_keypoints=2048).eval().cuda()  # load the extractor
-matcher = LightGlue(features='disk').eval().cuda()  # load the matcher
-
-# load each image as a torch.Tensor on GPU with shape (3,H,W), normalized in [0,1]
-image0 = load_image('path/to/image_0.jpg').cuda()
-image1 = load_image('path/to/image_1.jpg').cuda()
-
-# extract local features
-feats0 = extractor.extract(image0)  # auto-resize the image, disable with resize=None
-feats1 = extractor.extract(image1)
-
-# match the features
-matches01 = matcher({'image0': feats0, 'image1': feats1})
-feats0, feats1, matches01 = [rbd(x) for x in [feats0, feats1, matches01]]  # remove batch dimension
-matches = matches01['matches']  # indices with shape (K,2)
-points0 = feats0['keypoints'][matches[..., 0]]  # coordinates in image #0, shape (K,2)
-points1 = feats1['keypoints'][matches[..., 1]]  # coordinates in image #1, shape (K,2)
-```
-
-We also provide a convenience method to match a pair of images:
-
-```python
-from lightglue import match_pair
-feats0, feats1, matches01 = match_pair(extractor, matcher, image0, image1)
-```
-
-##
-
-<p align="center">
-  <a href="https://arxiv.org/abs/2306.13643"><img src="assets/teaser.svg" alt="Logo" width=50%></a>
-  <br>
-  <em>LightGlue can adjust its depth (number of layers) and width (number of keypoints) per image pair, with a marginal impact on accuracy.</em>
-</p>
-
-## Advanced configuration
-
-<details>
-<summary>[Detail of all parameters - click to expand]</summary>
-
-- ```n_layers```: Number of stacked self+cross attention layers. Reduce this value for faster inference at the cost of accuracy (continuous red line in the plot above). Default: 9 (all layers).
-- ```flash```: Enable FlashAttention. Significantly increases the speed and reduces the memory consumption without any impact on accuracy. Default: True (LightGlue automatically detects if FlashAttention is available).
-- ```mp```: Enable mixed precision inference. Default: False (off)
-- ```depth_confidence```: Controls the early stopping. A lower values stops more often at earlier layers. Default: 0.95, disable with -1.
-- ```width_confidence```: Controls the iterative point pruning. A lower value prunes more points earlier. Default: 0.99, disable with -1.
-- ```filter_threshold```: Match confidence. Increase this value to obtain less, but stronger matches. Default: 0.1
-
-</details>
-
-The default values give a good trade-off between speed and accuracy. To maximize the accuracy, use all keypoints and disable the adaptive mechanisms:
-```python
-extractor = SuperPoint(max_num_keypoints=None)
-matcher = LightGlue(features='superpoint', depth_confidence=-1, width_confidence=-1)
-```
-
-To increase the speed with a small drop of accuracy, decrease the number of keypoints and lower the adaptive thresholds:
-```python
-extractor = SuperPoint(max_num_keypoints=1024)
-matcher = LightGlue(features='superpoint', depth_confidence=0.9, width_confidence=0.95)
-```
-
-The maximum speed is obtained with a combination of:
-- [FlashAttention](https://arxiv.org/abs/2205.14135): automatically used when ```torch >= 2.0``` or if [installed from source](https://github.com/HazyResearch/flash-attention#installation-and-features).
-- PyTorch compilation, available when ```torch >= 2.0```:
-```python
-matcher = matcher.eval().cuda()
-matcher.compile(mode='reduce-overhead')
-```
-For inputs with fewer than 1536 keypoints (determined experimentally), this compiles LightGlue but disables point pruning (large overhead). For larger input sizes, it automatically falls backs to eager mode with point pruning. Adaptive depths is supported for any input size.
-
-## Benchmark
-
-
-<p align="center">
-  <a><img src="assets/benchmark.png" alt="Logo" width=80%></a>
-  <br>
-  <em>Benchmark results on GPU (RTX 3080). With compilation and adaptivity, LightGlue runs at 150 FPS @ 1024 keypoints and 50 FPS @ 4096 keypoints per image. This is a 4-10x speedup over SuperGlue. </em>
-</p>
-
-<p align="center">
-  <a><img src="assets/benchmark_cpu.png" alt="Logo" width=80%></a>
-  <br>
-  <em>Benchmark results on CPU (Intel i7 10700K). LightGlue runs at 20 FPS @ 512 keypoints. </em>
-</p>
-
-Obtain the same plots for your setup using our [benchmark script](benchmark.py):
-```
-python benchmark.py [--device cuda] [--add_superglue] [--num_keypoints 512 1024 2048 4096] [--compile]
-```
-
-<details>
-<summary>[Performance tip - click to expand]</summary>
-
-Note: **Point pruning** introduces an overhead that sometimes outweighs its benefits.
-Point pruning is thus enabled only when the there are more than N keypoints in an image, where N is hardware-dependent.
-We provide defaults optimized for current hardware (RTX 30xx GPUs).
-We suggest running the benchmark script and adjusting the thresholds for your hardware by updating `LightGlue.pruning_keypoint_thresholds['cuda']`.
-
-</details>
-
-## Training and evaluation
-
-With [Glue Factory](https://github.com/cvg/glue-factory), you can train LightGlue with your own local features, on your own dataset!
-You can also evaluate it and other baselines on standard benchmarks like HPatches and MegaDepth.
-
-## Other links
-- [hloc - the visual localization toolbox](https://github.com/cvg/Hierarchical-Localization/): run LightGlue for Structure-from-Motion and visual localization.
-- [LightGlue-ONNX](https://github.com/fabio-sim/LightGlue-ONNX): export LightGlue to the Open Neural Network Exchange (ONNX) format with support for TensorRT and OpenVINO.
-- [Image Matching WebUI](https://github.com/Vincentqyw/image-matching-webui): a web GUI to easily compare different matchers, including LightGlue.
-- [kornia](https://kornia.readthedocs.io) now exposes LightGlue via the interfaces [`LightGlue`](https://kornia.readthedocs.io/en/latest/feature.html#kornia.feature.LightGlue) and [`LightGlueMatcher`](https://kornia.readthedocs.io/en/latest/feature.html#kornia.feature.LightGlueMatcher).
-
-## BibTeX citation
-If you use any ideas from the paper or code from this repo, please consider citing:
-
-```txt
-@inproceedings{lindenberger2023lightglue,
-  author    = {Philipp Lindenberger and
-               Paul-Edouard Sarlin and
-               Marc Pollefeys},
-  title     = {{LightGlue: Local Feature Matching at Light Speed}},
-  booktitle = {ICCV},
-  year      = {2023}
-}
-```
-
-
-## License
-The pre-trained weights of LightGlue and the code provided in this repository are released under the [Apache-2.0 license](./LICENSE). [DISK](https://github.com/cvlab-epfl/disk) follows this license as well but SuperPoint follows [a different, restrictive license](https://github.com/magicleap/SuperPointPretrainedNetwork/blob/master/LICENSE) (this includes its pre-trained weights and its [inference file](./lightglue/superpoint.py)). [ALIKED](https://github.com/Shiaoming/ALIKED) was published under a BSD-3-Clause license. 

binární
python/LightGlue/assets/DSC_0410.JPG


binární
python/LightGlue/assets/DSC_0411.JPG


Rozdílová data souboru nebyla zobrazena, protože soubor je příliš velký
+ 0 - 718
python/LightGlue/assets/architecture.svg


binární
python/LightGlue/assets/benchmark.png


binární
python/LightGlue/assets/benchmark_cpu.png


binární
python/LightGlue/assets/easy_hard.jpg


binární
python/LightGlue/assets/sacre_coeur1.jpg


binární
python/LightGlue/assets/sacre_coeur2.jpg


+ 0 - 1499
python/LightGlue/assets/teaser.svg

@@ -1,1499 +0,0 @@
-<?xml version="1.0" encoding="utf-8" standalone="no"?>
-<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
-  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
-<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="351.50156pt" height="237.315312pt" viewBox="0 0 351.50156 237.315312" xmlns="http://www.w3.org/2000/svg" version="1.1">
- <metadata>
-  <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
-   <cc:Work>
-    <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
-    <dc:date>2023-06-25T11:23:59.261938</dc:date>
-    <dc:format>image/svg+xml</dc:format>
-    <dc:creator>
-     <cc:Agent>
-      <dc:title>Matplotlib v3.7.1, https://matplotlib.org/</dc:title>
-     </cc:Agent>
-    </dc:creator>
-   </cc:Work>
-  </rdf:RDF>
- </metadata>
- <defs>
-  <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
- </defs>
- <g id="figure_1">
-  <g id="patch_1">
-   <path d="M 0 237.315312 
-L 351.50156 237.315312 
-L 351.50156 0 
-L 0 0 
-z
-" style="fill: #ffffff"/>
-  </g>
-  <g id="axes_1">
-   <g id="patch_2">
-    <path d="M 38.242188 202.12 
-L 351.50156 202.12 
-L 351.50156 0 
-L 38.242188 0 
-z
-" style="fill: #f2f2f2"/>
-   </g>
-   <g id="PathCollection_1">
-    <defs>
-     <path id="md5bda44a6b" d="M 0 2.738613 
-C 0.726289 2.738613 1.422928 2.450055 1.936492 1.936492 
-C 2.450055 1.422928 2.738613 0.726289 2.738613 0 
-C 2.738613 -0.726289 2.450055 -1.422928 1.936492 -1.936492 
-C 1.422928 -2.450055 0.726289 -2.738613 0 -2.738613 
-C -0.726289 -2.738613 -1.422928 -2.450055 -1.936492 -1.936492 
-C -2.450055 -1.422928 -2.738613 -0.726289 -2.738613 0 
-C -2.738613 0.726289 -2.450055 1.422928 -1.936492 1.936492 
-C -1.422928 2.450055 -0.726289 2.738613 0 2.738613 
-z
-"/>
-    </defs>
-    <g clip-path="url(#pb46ed2897c)">
-     <use xlink:href="#md5bda44a6b" x="117.273002" y="77.281176" style="fill: #0000ff"/>
-    </g>
-   </g>
-   <g id="PathCollection_2">
-    <defs>
-     <path id="m3541600ca9" d="M -0 3.872983 
-L 3.872983 -3.872983 
-L -3.872983 -3.872983 
-z
-"/>
-    </defs>
-    <g clip-path="url(#pb46ed2897c)">
-     <use xlink:href="#m3541600ca9" x="113.203664" y="196.175294" style="fill: #008000"/>
-    </g>
-   </g>
-   <g id="PathCollection_3">
-    <defs>
-     <path id="mee49ddcd29" d="M 0 2.738613 
-C 0.726289 2.738613 1.422928 2.450055 1.936492 1.936492 
-C 2.450055 1.422928 2.738613 0.726289 2.738613 0 
-C 2.738613 -0.726289 2.450055 -1.422928 1.936492 -1.936492 
-C 1.422928 -2.450055 0.726289 -2.738613 0 -2.738613 
-C -0.726289 -2.738613 -1.422928 -2.450055 -1.936492 -1.936492 
-C -2.450055 -1.422928 -2.738613 -0.726289 -2.738613 0 
-C -2.738613 0.726289 -2.450055 1.422928 -1.936492 1.936492 
-C -1.422928 2.450055 -0.726289 2.738613 0 2.738613 
-z
-"/>
-    </defs>
-    <g clip-path="url(#pb46ed2897c)">
-     <use xlink:href="#mee49ddcd29" x="68.806591" y="41.612941"/>
-    </g>
-   </g>
-   <g id="PathCollection_4">
-    <defs>
-     <path id="m3986887d56" d="M 0 2.738613 
-C 0.726289 2.738613 1.422928 2.450055 1.936492 1.936492 
-C 2.450055 1.422928 2.738613 0.726289 2.738613 0 
-C 2.738613 -0.726289 2.450055 -1.422928 1.936492 -1.936492 
-C 1.422928 -2.450055 0.726289 -2.738613 0 -2.738613 
-C -0.726289 -2.738613 -1.422928 -2.450055 -1.936492 -1.936492 
-C -2.450055 -1.422928 -2.738613 -0.726289 -2.738613 0 
-C -2.738613 0.726289 -2.450055 1.422928 -1.936492 1.936492 
-C -1.422928 2.450055 -0.726289 2.738613 0 2.738613 
-z
-"/>
-    </defs>
-    <g clip-path="url(#pb46ed2897c)">
-     <use xlink:href="#m3986887d56" x="52.800495" y="34.479294" style="fill: #800080"/>
-    </g>
-   </g>
-   <g id="PathCollection_5">
-    <defs>
-     <path id="m73cb4f1908" d="M 0 -5.91608 
-L -1.328243 -1.828169 
-L -5.626526 -1.828169 
-L -2.149142 0.698298 
-L -3.477384 4.786209 
-L -0 2.259741 
-L 3.477384 4.786209 
-L 2.149142 0.698298 
-L 5.626526 -1.828169 
-L 1.328243 -1.828169 
-z
-"/>
-    </defs>
-    <g clip-path="url(#pb46ed2897c)">
-     <use xlink:href="#m73cb4f1908" x="289.703869" y="47.557647" style="fill: #ff0000"/>
-    </g>
-   </g>
-   <g id="matplotlib.axis_1">
-    <g id="xtick_1">
-     <g id="line2d_1">
-      <defs>
-       <path id="m69d2a2ec97" d="M 0 0 
-L 0 3.5 
-" style="stroke: #000000; stroke-width: 0.8"/>
-      </defs>
-      <g>
-       <use xlink:href="#m69d2a2ec97" x="38.242188" y="202.12" style="stroke: #000000; stroke-width: 0.8"/>
-      </g>
-     </g>
-     <g id="text_1">
-      <!-- 0 -->
-      <g transform="translate(35.060938 216.718437) scale(0.1 -0.1)">
-       <defs>
-        <path id="DejaVuSans-30" d="M 2034 4250 
-Q 1547 4250 1301 3770 
-Q 1056 3291 1056 2328 
-Q 1056 1369 1301 889 
-Q 1547 409 2034 409 
-Q 2525 409 2770 889 
-Q 3016 1369 3016 2328 
-Q 3016 3291 2770 3770 
-Q 2525 4250 2034 4250 
-z
-M 2034 4750 
-Q 2819 4750 3233 4129 
-Q 3647 3509 3647 2328 
-Q 3647 1150 3233 529 
-Q 2819 -91 2034 -91 
-Q 1250 -91 836 529 
-Q 422 1150 422 2328 
-Q 422 3509 836 4129 
-Q 1250 4750 2034 4750 
-z
-" transform="scale(0.015625)"/>
-       </defs>
-       <use xlink:href="#DejaVuSans-30"/>
-      </g>
-     </g>
-    </g>
-    <g id="xtick_2">
-     <g id="line2d_2">
-      <g>
-       <use xlink:href="#m69d2a2ec97" x="93.563757" y="202.12" style="stroke: #000000; stroke-width: 0.8"/>
-      </g>
-     </g>
-     <g id="text_2">
-      <!-- 10 -->
-      <g transform="translate(87.201257 216.718437) scale(0.1 -0.1)">
-       <defs>
-        <path id="DejaVuSans-31" d="M 794 531 
-L 1825 531 
-L 1825 4091 
-L 703 3866 
-L 703 4441 
-L 1819 4666 
-L 2450 4666 
-L 2450 531 
-L 3481 531 
-L 3481 0 
-L 794 0 
-L 794 531 
-z
-" transform="scale(0.015625)"/>
-       </defs>
-       <use xlink:href="#DejaVuSans-31"/>
-       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
-      </g>
-     </g>
-    </g>
-    <g id="xtick_3">
-     <g id="line2d_3">
-      <g>
-       <use xlink:href="#m69d2a2ec97" x="148.885327" y="202.12" style="stroke: #000000; stroke-width: 0.8"/>
-      </g>
-     </g>
-     <g id="text_3">
-      <!-- 20 -->
-      <g transform="translate(142.522827 216.718437) scale(0.1 -0.1)">
-       <defs>
-        <path id="DejaVuSans-32" d="M 1228 531 
-L 3431 531 
-L 3431 0 
-L 469 0 
-L 469 531 
-Q 828 903 1448 1529 
-Q 2069 2156 2228 2338 
-Q 2531 2678 2651 2914 
-Q 2772 3150 2772 3378 
-Q 2772 3750 2511 3984 
-Q 2250 4219 1831 4219 
-Q 1534 4219 1204 4116 
-Q 875 4013 500 3803 
-L 500 4441 
-Q 881 4594 1212 4672 
-Q 1544 4750 1819 4750 
-Q 2544 4750 2975 4387 
-Q 3406 4025 3406 3419 
-Q 3406 3131 3298 2873 
-Q 3191 2616 2906 2266 
-Q 2828 2175 2409 1742 
-Q 1991 1309 1228 531 
-z
-" transform="scale(0.015625)"/>
-       </defs>
-       <use xlink:href="#DejaVuSans-32"/>
-       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
-      </g>
-     </g>
-    </g>
-    <g id="xtick_4">
-     <g id="line2d_4">
-      <g>
-       <use xlink:href="#m69d2a2ec97" x="204.206897" y="202.12" style="stroke: #000000; stroke-width: 0.8"/>
-      </g>
-     </g>
-     <g id="text_4">
-      <!-- 30 -->
-      <g transform="translate(197.844397 216.718437) scale(0.1 -0.1)">
-       <defs>
-        <path id="DejaVuSans-33" d="M 2597 2516 
-Q 3050 2419 3304 2112 
-Q 3559 1806 3559 1356 
-Q 3559 666 3084 287 
-Q 2609 -91 1734 -91 
-Q 1441 -91 1130 -33 
-Q 819 25 488 141 
-L 488 750 
-Q 750 597 1062 519 
-Q 1375 441 1716 441 
-Q 2309 441 2620 675 
-Q 2931 909 2931 1356 
-Q 2931 1769 2642 2001 
-Q 2353 2234 1838 2234 
-L 1294 2234 
-L 1294 2753 
-L 1863 2753 
-Q 2328 2753 2575 2939 
-Q 2822 3125 2822 3475 
-Q 2822 3834 2567 4026 
-Q 2313 4219 1838 4219 
-Q 1578 4219 1281 4162 
-Q 984 4106 628 3988 
-L 628 4550 
-Q 988 4650 1302 4700 
-Q 1616 4750 1894 4750 
-Q 2613 4750 3031 4423 
-Q 3450 4097 3450 3541 
-Q 3450 3153 3228 2886 
-Q 3006 2619 2597 2516 
-z
-" transform="scale(0.015625)"/>
-       </defs>
-       <use xlink:href="#DejaVuSans-33"/>
-       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
-      </g>
-     </g>
-    </g>
-    <g id="xtick_5">
-     <g id="line2d_5">
-      <g>
-       <use xlink:href="#m69d2a2ec97" x="259.528467" y="202.12" style="stroke: #000000; stroke-width: 0.8"/>
-      </g>
-     </g>
-     <g id="text_5">
-      <!-- 40 -->
-      <g transform="translate(253.165967 216.718437) scale(0.1 -0.1)">
-       <defs>
-        <path id="DejaVuSans-34" d="M 2419 4116 
-L 825 1625 
-L 2419 1625 
-L 2419 4116 
-z
-M 2253 4666 
-L 3047 4666 
-L 3047 1625 
-L 3713 1625 
-L 3713 1100 
-L 3047 1100 
-L 3047 0 
-L 2419 0 
-L 2419 1100 
-L 313 1100 
-L 313 1709 
-L 2253 4666 
-z
-" transform="scale(0.015625)"/>
-       </defs>
-       <use xlink:href="#DejaVuSans-34"/>
-       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
-      </g>
-     </g>
-    </g>
-    <g id="xtick_6">
-     <g id="line2d_6">
-      <g>
-       <use xlink:href="#m69d2a2ec97" x="314.850037" y="202.12" style="stroke: #000000; stroke-width: 0.8"/>
-      </g>
-     </g>
-     <g id="text_6">
-      <!-- 50 -->
-      <g transform="translate(308.487537 216.718437) scale(0.1 -0.1)">
-       <defs>
-        <path id="DejaVuSans-35" d="M 691 4666 
-L 3169 4666 
-L 3169 4134 
-L 1269 4134 
-L 1269 2991 
-Q 1406 3038 1543 3061 
-Q 1681 3084 1819 3084 
-Q 2600 3084 3056 2656 
-Q 3513 2228 3513 1497 
-Q 3513 744 3044 326 
-Q 2575 -91 1722 -91 
-Q 1428 -91 1123 -41 
-Q 819 9 494 109 
-L 494 744 
-Q 775 591 1075 516 
-Q 1375 441 1709 441 
-Q 2250 441 2565 725 
-Q 2881 1009 2881 1497 
-Q 2881 1984 2565 2268 
-Q 2250 2553 1709 2553 
-Q 1456 2553 1204 2497 
-Q 953 2441 691 2322 
-L 691 4666 
-z
-" transform="scale(0.015625)"/>
-       </defs>
-       <use xlink:href="#DejaVuSans-35"/>
-       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
-      </g>
-     </g>
-    </g>
-    <g id="text_7">
-     <!-- Image Pairs Per Second -->
-     <g transform="translate(106.824218 234.195781) scale(0.15 -0.15)">
-      <defs>
-       <path id="DejaVuSans-49" d="M 628 4666 
-L 1259 4666 
-L 1259 0 
-L 628 0 
-L 628 4666 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-6d" d="M 3328 2828 
-Q 3544 3216 3844 3400 
-Q 4144 3584 4550 3584 
-Q 5097 3584 5394 3201 
-Q 5691 2819 5691 2113 
-L 5691 0 
-L 5113 0 
-L 5113 2094 
-Q 5113 2597 4934 2840 
-Q 4756 3084 4391 3084 
-Q 3944 3084 3684 2787 
-Q 3425 2491 3425 1978 
-L 3425 0 
-L 2847 0 
-L 2847 2094 
-Q 2847 2600 2669 2842 
-Q 2491 3084 2119 3084 
-Q 1678 3084 1418 2786 
-Q 1159 2488 1159 1978 
-L 1159 0 
-L 581 0 
-L 581 3500 
-L 1159 3500 
-L 1159 2956 
-Q 1356 3278 1631 3431 
-Q 1906 3584 2284 3584 
-Q 2666 3584 2933 3390 
-Q 3200 3197 3328 2828 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-61" d="M 2194 1759 
-Q 1497 1759 1228 1600 
-Q 959 1441 959 1056 
-Q 959 750 1161 570 
-Q 1363 391 1709 391 
-Q 2188 391 2477 730 
-Q 2766 1069 2766 1631 
-L 2766 1759 
-L 2194 1759 
-z
-M 3341 1997 
-L 3341 0 
-L 2766 0 
-L 2766 531 
-Q 2569 213 2275 61 
-Q 1981 -91 1556 -91 
-Q 1019 -91 701 211 
-Q 384 513 384 1019 
-Q 384 1609 779 1909 
-Q 1175 2209 1959 2209 
-L 2766 2209 
-L 2766 2266 
-Q 2766 2663 2505 2880 
-Q 2244 3097 1772 3097 
-Q 1472 3097 1187 3025 
-Q 903 2953 641 2809 
-L 641 3341 
-Q 956 3463 1253 3523 
-Q 1550 3584 1831 3584 
-Q 2591 3584 2966 3190 
-Q 3341 2797 3341 1997 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-67" d="M 2906 1791 
-Q 2906 2416 2648 2759 
-Q 2391 3103 1925 3103 
-Q 1463 3103 1205 2759 
-Q 947 2416 947 1791 
-Q 947 1169 1205 825 
-Q 1463 481 1925 481 
-Q 2391 481 2648 825 
-Q 2906 1169 2906 1791 
-z
-M 3481 434 
-Q 3481 -459 3084 -895 
-Q 2688 -1331 1869 -1331 
-Q 1566 -1331 1297 -1286 
-Q 1028 -1241 775 -1147 
-L 775 -588 
-Q 1028 -725 1275 -790 
-Q 1522 -856 1778 -856 
-Q 2344 -856 2625 -561 
-Q 2906 -266 2906 331 
-L 2906 616 
-Q 2728 306 2450 153 
-Q 2172 0 1784 0 
-Q 1141 0 747 490 
-Q 353 981 353 1791 
-Q 353 2603 747 3093 
-Q 1141 3584 1784 3584 
-Q 2172 3584 2450 3431 
-Q 2728 3278 2906 2969 
-L 2906 3500 
-L 3481 3500 
-L 3481 434 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-65" d="M 3597 1894 
-L 3597 1613 
-L 953 1613 
-Q 991 1019 1311 708 
-Q 1631 397 2203 397 
-Q 2534 397 2845 478 
-Q 3156 559 3463 722 
-L 3463 178 
-Q 3153 47 2828 -22 
-Q 2503 -91 2169 -91 
-Q 1331 -91 842 396 
-Q 353 884 353 1716 
-Q 353 2575 817 3079 
-Q 1281 3584 2069 3584 
-Q 2775 3584 3186 3129 
-Q 3597 2675 3597 1894 
-z
-M 3022 2063 
-Q 3016 2534 2758 2815 
-Q 2500 3097 2075 3097 
-Q 1594 3097 1305 2825 
-Q 1016 2553 972 2059 
-L 3022 2063 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-20" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-50" d="M 1259 4147 
-L 1259 2394 
-L 2053 2394 
-Q 2494 2394 2734 2622 
-Q 2975 2850 2975 3272 
-Q 2975 3691 2734 3919 
-Q 2494 4147 2053 4147 
-L 1259 4147 
-z
-M 628 4666 
-L 2053 4666 
-Q 2838 4666 3239 4311 
-Q 3641 3956 3641 3272 
-Q 3641 2581 3239 2228 
-Q 2838 1875 2053 1875 
-L 1259 1875 
-L 1259 0 
-L 628 0 
-L 628 4666 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-69" d="M 603 3500 
-L 1178 3500 
-L 1178 0 
-L 603 0 
-L 603 3500 
-z
-M 603 4863 
-L 1178 4863 
-L 1178 4134 
-L 603 4134 
-L 603 4863 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-72" d="M 2631 2963 
-Q 2534 3019 2420 3045 
-Q 2306 3072 2169 3072 
-Q 1681 3072 1420 2755 
-Q 1159 2438 1159 1844 
-L 1159 0 
-L 581 0 
-L 581 3500 
-L 1159 3500 
-L 1159 2956 
-Q 1341 3275 1631 3429 
-Q 1922 3584 2338 3584 
-Q 2397 3584 2469 3576 
-Q 2541 3569 2628 3553 
-L 2631 2963 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-73" d="M 2834 3397 
-L 2834 2853 
-Q 2591 2978 2328 3040 
-Q 2066 3103 1784 3103 
-Q 1356 3103 1142 2972 
-Q 928 2841 928 2578 
-Q 928 2378 1081 2264 
-Q 1234 2150 1697 2047 
-L 1894 2003 
-Q 2506 1872 2764 1633 
-Q 3022 1394 3022 966 
-Q 3022 478 2636 193 
-Q 2250 -91 1575 -91 
-Q 1294 -91 989 -36 
-Q 684 19 347 128 
-L 347 722 
-Q 666 556 975 473 
-Q 1284 391 1588 391 
-Q 1994 391 2212 530 
-Q 2431 669 2431 922 
-Q 2431 1156 2273 1281 
-Q 2116 1406 1581 1522 
-L 1381 1569 
-Q 847 1681 609 1914 
-Q 372 2147 372 2553 
-Q 372 3047 722 3315 
-Q 1072 3584 1716 3584 
-Q 2034 3584 2315 3537 
-Q 2597 3491 2834 3397 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-53" d="M 3425 4513 
-L 3425 3897 
-Q 3066 4069 2747 4153 
-Q 2428 4238 2131 4238 
-Q 1616 4238 1336 4038 
-Q 1056 3838 1056 3469 
-Q 1056 3159 1242 3001 
-Q 1428 2844 1947 2747 
-L 2328 2669 
-Q 3034 2534 3370 2195 
-Q 3706 1856 3706 1288 
-Q 3706 609 3251 259 
-Q 2797 -91 1919 -91 
-Q 1588 -91 1214 -16 
-Q 841 59 441 206 
-L 441 856 
-Q 825 641 1194 531 
-Q 1563 422 1919 422 
-Q 2459 422 2753 634 
-Q 3047 847 3047 1241 
-Q 3047 1584 2836 1778 
-Q 2625 1972 2144 2069 
-L 1759 2144 
-Q 1053 2284 737 2584 
-Q 422 2884 422 3419 
-Q 422 4038 858 4394 
-Q 1294 4750 2059 4750 
-Q 2388 4750 2728 4690 
-Q 3069 4631 3425 4513 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-63" d="M 3122 3366 
-L 3122 2828 
-Q 2878 2963 2633 3030 
-Q 2388 3097 2138 3097 
-Q 1578 3097 1268 2742 
-Q 959 2388 959 1747 
-Q 959 1106 1268 751 
-Q 1578 397 2138 397 
-Q 2388 397 2633 464 
-Q 2878 531 3122 666 
-L 3122 134 
-Q 2881 22 2623 -34 
-Q 2366 -91 2075 -91 
-Q 1284 -91 818 406 
-Q 353 903 353 1747 
-Q 353 2603 823 3093 
-Q 1294 3584 2113 3584 
-Q 2378 3584 2631 3529 
-Q 2884 3475 3122 3366 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-6f" d="M 1959 3097 
-Q 1497 3097 1228 2736 
-Q 959 2375 959 1747 
-Q 959 1119 1226 758 
-Q 1494 397 1959 397 
-Q 2419 397 2687 759 
-Q 2956 1122 2956 1747 
-Q 2956 2369 2687 2733 
-Q 2419 3097 1959 3097 
-z
-M 1959 3584 
-Q 2709 3584 3137 3096 
-Q 3566 2609 3566 1747 
-Q 3566 888 3137 398 
-Q 2709 -91 1959 -91 
-Q 1206 -91 779 398 
-Q 353 888 353 1747 
-Q 353 2609 779 3096 
-Q 1206 3584 1959 3584 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-6e" d="M 3513 2113 
-L 3513 0 
-L 2938 0 
-L 2938 2094 
-Q 2938 2591 2744 2837 
-Q 2550 3084 2163 3084 
-Q 1697 3084 1428 2787 
-Q 1159 2491 1159 1978 
-L 1159 0 
-L 581 0 
-L 581 3500 
-L 1159 3500 
-L 1159 2956 
-Q 1366 3272 1645 3428 
-Q 1925 3584 2291 3584 
-Q 2894 3584 3203 3211 
-Q 3513 2838 3513 2113 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-64" d="M 2906 2969 
-L 2906 4863 
-L 3481 4863 
-L 3481 0 
-L 2906 0 
-L 2906 525 
-Q 2725 213 2448 61 
-Q 2172 -91 1784 -91 
-Q 1150 -91 751 415 
-Q 353 922 353 1747 
-Q 353 2572 751 3078 
-Q 1150 3584 1784 3584 
-Q 2172 3584 2448 3432 
-Q 2725 3281 2906 2969 
-z
-M 947 1747 
-Q 947 1113 1208 752 
-Q 1469 391 1925 391 
-Q 2381 391 2643 752 
-Q 2906 1113 2906 1747 
-Q 2906 2381 2643 2742 
-Q 2381 3103 1925 3103 
-Q 1469 3103 1208 2742 
-Q 947 2381 947 1747 
-z
-" transform="scale(0.015625)"/>
-      </defs>
-      <use xlink:href="#DejaVuSans-49"/>
-      <use xlink:href="#DejaVuSans-6d" x="29.492188"/>
-      <use xlink:href="#DejaVuSans-61" x="126.904297"/>
-      <use xlink:href="#DejaVuSans-67" x="188.183594"/>
-      <use xlink:href="#DejaVuSans-65" x="251.660156"/>
-      <use xlink:href="#DejaVuSans-20" x="313.183594"/>
-      <use xlink:href="#DejaVuSans-50" x="344.970703"/>
-      <use xlink:href="#DejaVuSans-61" x="400.773438"/>
-      <use xlink:href="#DejaVuSans-69" x="462.052734"/>
-      <use xlink:href="#DejaVuSans-72" x="489.835938"/>
-      <use xlink:href="#DejaVuSans-73" x="530.949219"/>
-      <use xlink:href="#DejaVuSans-20" x="583.048828"/>
-      <use xlink:href="#DejaVuSans-50" x="614.835938"/>
-      <use xlink:href="#DejaVuSans-65" x="671.513672"/>
-      <use xlink:href="#DejaVuSans-72" x="733.037109"/>
-      <use xlink:href="#DejaVuSans-20" x="774.150391"/>
-      <use xlink:href="#DejaVuSans-53" x="805.9375"/>
-      <use xlink:href="#DejaVuSans-65" x="869.414062"/>
-      <use xlink:href="#DejaVuSans-63" x="930.9375"/>
-      <use xlink:href="#DejaVuSans-6f" x="985.917969"/>
-      <use xlink:href="#DejaVuSans-6e" x="1047.099609"/>
-      <use xlink:href="#DejaVuSans-64" x="1110.478516"/>
-     </g>
-    </g>
-   </g>
-   <g id="matplotlib.axis_2">
-    <g id="ytick_1">
-     <g id="line2d_7">
-      <defs>
-       <path id="m433b6a5b4b" d="M 0 0 
-L -3.5 0 
-" style="stroke: #000000; stroke-width: 0.8"/>
-      </defs>
-      <g>
-       <use xlink:href="#m433b6a5b4b" x="38.242188" y="184.285882" style="stroke: #000000; stroke-width: 0.8"/>
-      </g>
-     </g>
-     <g id="text_8">
-      <!-- 64 -->
-      <g transform="translate(18.517188 188.085101) scale(0.1 -0.1)">
-       <defs>
-        <path id="DejaVuSans-36" d="M 2113 2584 
-Q 1688 2584 1439 2293 
-Q 1191 2003 1191 1497 
-Q 1191 994 1439 701 
-Q 1688 409 2113 409 
-Q 2538 409 2786 701 
-Q 3034 994 3034 1497 
-Q 3034 2003 2786 2293 
-Q 2538 2584 2113 2584 
-z
-M 3366 4563 
-L 3366 3988 
-Q 3128 4100 2886 4159 
-Q 2644 4219 2406 4219 
-Q 1781 4219 1451 3797 
-Q 1122 3375 1075 2522 
-Q 1259 2794 1537 2939 
-Q 1816 3084 2150 3084 
-Q 2853 3084 3261 2657 
-Q 3669 2231 3669 1497 
-Q 3669 778 3244 343 
-Q 2819 -91 2113 -91 
-Q 1303 -91 875 529 
-Q 447 1150 447 2328 
-Q 447 3434 972 4092 
-Q 1497 4750 2381 4750 
-Q 2619 4750 2861 4703 
-Q 3103 4656 3366 4563 
-z
-" transform="scale(0.015625)"/>
-       </defs>
-       <use xlink:href="#DejaVuSans-36"/>
-       <use xlink:href="#DejaVuSans-34" x="63.623047"/>
-      </g>
-     </g>
-    </g>
-    <g id="ytick_2">
-     <g id="line2d_8">
-      <g>
-       <use xlink:href="#m433b6a5b4b" x="38.242188" y="124.838824" style="stroke: #000000; stroke-width: 0.8"/>
-      </g>
-     </g>
-     <g id="text_9">
-      <!-- 65 -->
-      <g transform="translate(18.517188 128.638042) scale(0.1 -0.1)">
-       <use xlink:href="#DejaVuSans-36"/>
-       <use xlink:href="#DejaVuSans-35" x="63.623047"/>
-      </g>
-     </g>
-    </g>
-    <g id="ytick_3">
-     <g id="line2d_9">
-      <g>
-       <use xlink:href="#m433b6a5b4b" x="38.242188" y="65.391765" style="stroke: #000000; stroke-width: 0.8"/>
-      </g>
-     </g>
-     <g id="text_10">
-      <!-- 66 -->
-      <g transform="translate(18.517188 69.190983) scale(0.1 -0.1)">
-       <use xlink:href="#DejaVuSans-36"/>
-       <use xlink:href="#DejaVuSans-36" x="63.623047"/>
-      </g>
-     </g>
-    </g>
-    <g id="ytick_4">
-     <g id="line2d_10">
-      <g>
-       <use xlink:href="#m433b6a5b4b" x="38.242188" y="5.944706" style="stroke: #000000; stroke-width: 0.8"/>
-      </g>
-     </g>
-     <g id="text_11">
-      <!-- 67 -->
-      <g transform="translate(18.517188 9.743925) scale(0.1 -0.1)">
-       <defs>
-        <path id="DejaVuSans-37" d="M 525 4666 
-L 3525 4666 
-L 3525 4397 
-L 1831 0 
-L 1172 0 
-L 2766 4134 
-L 525 4134 
-L 525 4666 
-z
-" transform="scale(0.015625)"/>
-       </defs>
-       <use xlink:href="#DejaVuSans-36"/>
-       <use xlink:href="#DejaVuSans-37" x="63.623047"/>
-      </g>
-     </g>
-    </g>
-    <g id="text_12">
-     <!-- Relative Pose Accuracy [%] -->
-     <g transform="translate(11.397656 203.038906) rotate(-90) scale(0.15 -0.15)">
-      <defs>
-       <path id="DejaVuSans-52" d="M 2841 2188 
-Q 3044 2119 3236 1894 
-Q 3428 1669 3622 1275 
-L 4263 0 
-L 3584 0 
-L 2988 1197 
-Q 2756 1666 2539 1819 
-Q 2322 1972 1947 1972 
-L 1259 1972 
-L 1259 0 
-L 628 0 
-L 628 4666 
-L 2053 4666 
-Q 2853 4666 3247 4331 
-Q 3641 3997 3641 3322 
-Q 3641 2881 3436 2590 
-Q 3231 2300 2841 2188 
-z
-M 1259 4147 
-L 1259 2491 
-L 2053 2491 
-Q 2509 2491 2742 2702 
-Q 2975 2913 2975 3322 
-Q 2975 3731 2742 3939 
-Q 2509 4147 2053 4147 
-L 1259 4147 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-6c" d="M 603 4863 
-L 1178 4863 
-L 1178 0 
-L 603 0 
-L 603 4863 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-74" d="M 1172 4494 
-L 1172 3500 
-L 2356 3500 
-L 2356 3053 
-L 1172 3053 
-L 1172 1153 
-Q 1172 725 1289 603 
-Q 1406 481 1766 481 
-L 2356 481 
-L 2356 0 
-L 1766 0 
-Q 1100 0 847 248 
-Q 594 497 594 1153 
-L 594 3053 
-L 172 3053 
-L 172 3500 
-L 594 3500 
-L 594 4494 
-L 1172 4494 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-76" d="M 191 3500 
-L 800 3500 
-L 1894 563 
-L 2988 3500 
-L 3597 3500 
-L 2284 0 
-L 1503 0 
-L 191 3500 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-41" d="M 2188 4044 
-L 1331 1722 
-L 3047 1722 
-L 2188 4044 
-z
-M 1831 4666 
-L 2547 4666 
-L 4325 0 
-L 3669 0 
-L 3244 1197 
-L 1141 1197 
-L 716 0 
-L 50 0 
-L 1831 4666 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-75" d="M 544 1381 
-L 544 3500 
-L 1119 3500 
-L 1119 1403 
-Q 1119 906 1312 657 
-Q 1506 409 1894 409 
-Q 2359 409 2629 706 
-Q 2900 1003 2900 1516 
-L 2900 3500 
-L 3475 3500 
-L 3475 0 
-L 2900 0 
-L 2900 538 
-Q 2691 219 2414 64 
-Q 2138 -91 1772 -91 
-Q 1169 -91 856 284 
-Q 544 659 544 1381 
-z
-M 1991 3584 
-L 1991 3584 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-79" d="M 2059 -325 
-Q 1816 -950 1584 -1140 
-Q 1353 -1331 966 -1331 
-L 506 -1331 
-L 506 -850 
-L 844 -850 
-Q 1081 -850 1212 -737 
-Q 1344 -625 1503 -206 
-L 1606 56 
-L 191 3500 
-L 800 3500 
-L 1894 763 
-L 2988 3500 
-L 3597 3500 
-L 2059 -325 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-5b" d="M 550 4863 
-L 1875 4863 
-L 1875 4416 
-L 1125 4416 
-L 1125 -397 
-L 1875 -397 
-L 1875 -844 
-L 550 -844 
-L 550 4863 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-25" d="M 4653 2053 
-Q 4381 2053 4226 1822 
-Q 4072 1591 4072 1178 
-Q 4072 772 4226 539 
-Q 4381 306 4653 306 
-Q 4919 306 5073 539 
-Q 5228 772 5228 1178 
-Q 5228 1588 5073 1820 
-Q 4919 2053 4653 2053 
-z
-M 4653 2450 
-Q 5147 2450 5437 2106 
-Q 5728 1763 5728 1178 
-Q 5728 594 5436 251 
-Q 5144 -91 4653 -91 
-Q 4153 -91 3862 251 
-Q 3572 594 3572 1178 
-Q 3572 1766 3864 2108 
-Q 4156 2450 4653 2450 
-z
-M 1428 4353 
-Q 1159 4353 1004 4120 
-Q 850 3888 850 3481 
-Q 850 3069 1003 2837 
-Q 1156 2606 1428 2606 
-Q 1700 2606 1854 2837 
-Q 2009 3069 2009 3481 
-Q 2009 3884 1853 4118 
-Q 1697 4353 1428 4353 
-z
-M 4250 4750 
-L 4750 4750 
-L 1831 -91 
-L 1331 -91 
-L 4250 4750 
-z
-M 1428 4750 
-Q 1922 4750 2215 4408 
-Q 2509 4066 2509 3481 
-Q 2509 2891 2217 2550 
-Q 1925 2209 1428 2209 
-Q 931 2209 642 2551 
-Q 353 2894 353 3481 
-Q 353 4063 643 4406 
-Q 934 4750 1428 4750 
-z
-" transform="scale(0.015625)"/>
-       <path id="DejaVuSans-5d" d="M 1947 4863 
-L 1947 -844 
-L 622 -844 
-L 622 -397 
-L 1369 -397 
-L 1369 4416 
-L 622 4416 
-L 622 4863 
-L 1947 4863 
-z
-" transform="scale(0.015625)"/>
-      </defs>
-      <use xlink:href="#DejaVuSans-52"/>
-      <use xlink:href="#DejaVuSans-65" x="64.982422"/>
-      <use xlink:href="#DejaVuSans-6c" x="126.505859"/>
-      <use xlink:href="#DejaVuSans-61" x="154.289062"/>
-      <use xlink:href="#DejaVuSans-74" x="215.568359"/>
-      <use xlink:href="#DejaVuSans-69" x="254.777344"/>
-      <use xlink:href="#DejaVuSans-76" x="282.560547"/>
-      <use xlink:href="#DejaVuSans-65" x="341.740234"/>
-      <use xlink:href="#DejaVuSans-20" x="403.263672"/>
-      <use xlink:href="#DejaVuSans-50" x="435.050781"/>
-      <use xlink:href="#DejaVuSans-6f" x="491.728516"/>
-      <use xlink:href="#DejaVuSans-73" x="552.910156"/>
-      <use xlink:href="#DejaVuSans-65" x="605.009766"/>
-      <use xlink:href="#DejaVuSans-20" x="666.533203"/>
-      <use xlink:href="#DejaVuSans-41" x="698.320312"/>
-      <use xlink:href="#DejaVuSans-63" x="764.978516"/>
-      <use xlink:href="#DejaVuSans-63" x="819.958984"/>
-      <use xlink:href="#DejaVuSans-75" x="874.939453"/>
-      <use xlink:href="#DejaVuSans-72" x="938.318359"/>
-      <use xlink:href="#DejaVuSans-61" x="979.431641"/>
-      <use xlink:href="#DejaVuSans-63" x="1040.710938"/>
-      <use xlink:href="#DejaVuSans-79" x="1095.691406"/>
-      <use xlink:href="#DejaVuSans-20" x="1154.871094"/>
-      <use xlink:href="#DejaVuSans-5b" x="1186.658203"/>
-      <use xlink:href="#DejaVuSans-25" x="1225.671875"/>
-      <use xlink:href="#DejaVuSans-5d" x="1320.691406"/>
-     </g>
-    </g>
-   </g>
-   <g id="patch_3">
-    <path d="M 38.242188 202.12 
-L 38.242188 0 
-" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/>
-   </g>
-   <g id="patch_4">
-    <path d="M 351.50156 202.12 
-L 351.50156 0 
-" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/>
-   </g>
-   <g id="patch_5">
-    <path d="M 38.242188 202.12 
-L 351.50156 202.12 
-" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/>
-   </g>
-   <g id="patch_6">
-    <path d="M 38.242188 0 
-L 351.50156 0 
-" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/>
-   </g>
-   <g id="text_13">
-    <!-- SuperGlue -->
-    <g style="fill: #0000ff" transform="translate(73.036283 100.678833) scale(0.15 -0.15)">
-     <defs>
-      <path id="DejaVuSans-70" d="M 1159 525 
-L 1159 -1331 
-L 581 -1331 
-L 581 3500 
-L 1159 3500 
-L 1159 2969 
-Q 1341 3281 1617 3432 
-Q 1894 3584 2278 3584 
-Q 2916 3584 3314 3078 
-Q 3713 2572 3713 1747 
-Q 3713 922 3314 415 
-Q 2916 -91 2278 -91 
-Q 1894 -91 1617 61 
-Q 1341 213 1159 525 
-z
-M 3116 1747 
-Q 3116 2381 2855 2742 
-Q 2594 3103 2138 3103 
-Q 1681 3103 1420 2742 
-Q 1159 2381 1159 1747 
-Q 1159 1113 1420 752 
-Q 1681 391 2138 391 
-Q 2594 391 2855 752 
-Q 3116 1113 3116 1747 
-z
-" transform="scale(0.015625)"/>
-      <path id="DejaVuSans-47" d="M 3809 666 
-L 3809 1919 
-L 2778 1919 
-L 2778 2438 
-L 4434 2438 
-L 4434 434 
-Q 4069 175 3628 42 
-Q 3188 -91 2688 -91 
-Q 1594 -91 976 548 
-Q 359 1188 359 2328 
-Q 359 3472 976 4111 
-Q 1594 4750 2688 4750 
-Q 3144 4750 3555 4637 
-Q 3966 4525 4313 4306 
-L 4313 3634 
-Q 3963 3931 3569 4081 
-Q 3175 4231 2741 4231 
-Q 1884 4231 1454 3753 
-Q 1025 3275 1025 2328 
-Q 1025 1384 1454 906 
-Q 1884 428 2741 428 
-Q 3075 428 3337 486 
-Q 3600 544 3809 666 
-z
-" transform="scale(0.015625)"/>
-     </defs>
-     <use xlink:href="#DejaVuSans-53"/>
-     <use xlink:href="#DejaVuSans-75" x="63.476562"/>
-     <use xlink:href="#DejaVuSans-70" x="126.855469"/>
-     <use xlink:href="#DejaVuSans-65" x="190.332031"/>
-     <use xlink:href="#DejaVuSans-72" x="251.855469"/>
-     <use xlink:href="#DejaVuSans-47" x="292.96875"/>
-     <use xlink:href="#DejaVuSans-6c" x="370.458984"/>
-     <use xlink:href="#DejaVuSans-75" x="398.242188"/>
-     <use xlink:href="#DejaVuSans-65" x="461.621094"/>
-    </g>
-   </g>
-   <g id="text_14">
-    <!-- SGMNet -->
-    <g style="fill: #008000" transform="translate(87.993899 188.055763) scale(0.15 -0.15)">
-     <defs>
-      <path id="DejaVuSans-4d" d="M 628 4666 
-L 1569 4666 
-L 2759 1491 
-L 3956 4666 
-L 4897 4666 
-L 4897 0 
-L 4281 0 
-L 4281 4097 
-L 3078 897 
-L 2444 897 
-L 1241 4097 
-L 1241 0 
-L 628 0 
-L 628 4666 
-z
-" transform="scale(0.015625)"/>
-      <path id="DejaVuSans-4e" d="M 628 4666 
-L 1478 4666 
-L 3547 763 
-L 3547 4666 
-L 4159 4666 
-L 4159 0 
-L 3309 0 
-L 1241 3903 
-L 1241 0 
-L 628 0 
-L 628 4666 
-z
-" transform="scale(0.015625)"/>
-     </defs>
-     <use xlink:href="#DejaVuSans-53"/>
-     <use xlink:href="#DejaVuSans-47" x="63.476562"/>
-     <use xlink:href="#DejaVuSans-4d" x="140.966797"/>
-     <use xlink:href="#DejaVuSans-4e" x="227.246094"/>
-     <use xlink:href="#DejaVuSans-65" x="302.050781"/>
-     <use xlink:href="#DejaVuSans-74" x="363.574219"/>
-    </g>
-   </g>
-   <g id="text_15">
-    <!-- LoFTR -->
-    <g transform="translate(46.195263 63.010597) scale(0.15 -0.15)">
-     <defs>
-      <path id="DejaVuSans-4c" d="M 628 4666 
-L 1259 4666 
-L 1259 531 
-L 3531 531 
-L 3531 0 
-L 628 0 
-L 628 4666 
-z
-" transform="scale(0.015625)"/>
-      <path id="DejaVuSans-46" d="M 628 4666 
-L 3309 4666 
-L 3309 4134 
-L 1259 4134 
-L 1259 2759 
-L 3109 2759 
-L 3109 2228 
-L 1259 2228 
-L 1259 0 
-L 628 0 
-L 628 4666 
-z
-" transform="scale(0.015625)"/>
-      <path id="DejaVuSans-54" d="M -19 4666 
-L 3928 4666 
-L 3928 4134 
-L 2272 4134 
-L 2272 0 
-L 1638 0 
-L 1638 4134 
-L -19 4134 
-L -19 4666 
-z
-" transform="scale(0.015625)"/>
-     </defs>
-     <use xlink:href="#DejaVuSans-4c"/>
-     <use xlink:href="#DejaVuSans-6f" x="53.962891"/>
-     <use xlink:href="#DejaVuSans-46" x="115.144531"/>
-     <use xlink:href="#DejaVuSans-54" x="170.914062"/>
-     <use xlink:href="#DejaVuSans-52" x="231.998047"/>
-    </g>
-   </g>
-   <g id="text_16">
-    <!-- MatchFormer -->
-    <g style="fill: #800080" transform="translate(42.800495 23.359763) scale(0.15 -0.15)">
-     <defs>
-      <path id="DejaVuSans-68" d="M 3513 2113 
-L 3513 0 
-L 2938 0 
-L 2938 2094 
-Q 2938 2591 2744 2837 
-Q 2550 3084 2163 3084 
-Q 1697 3084 1428 2787 
-Q 1159 2491 1159 1978 
-L 1159 0 
-L 581 0 
-L 581 4863 
-L 1159 4863 
-L 1159 2956 
-Q 1366 3272 1645 3428 
-Q 1925 3584 2291 3584 
-Q 2894 3584 3203 3211 
-Q 3513 2838 3513 2113 
-z
-" transform="scale(0.015625)"/>
-     </defs>
-     <use xlink:href="#DejaVuSans-4d"/>
-     <use xlink:href="#DejaVuSans-61" x="86.279297"/>
-     <use xlink:href="#DejaVuSans-74" x="147.558594"/>
-     <use xlink:href="#DejaVuSans-63" x="186.767578"/>
-     <use xlink:href="#DejaVuSans-68" x="241.748047"/>
-     <use xlink:href="#DejaVuSans-46" x="305.126953"/>
-     <use xlink:href="#DejaVuSans-6f" x="359.021484"/>
-     <use xlink:href="#DejaVuSans-72" x="420.203125"/>
-     <use xlink:href="#DejaVuSans-6d" x="459.566406"/>
-     <use xlink:href="#DejaVuSans-65" x="556.978516"/>
-     <use xlink:href="#DejaVuSans-72" x="618.501953"/>
-    </g>
-   </g>
-   <g id="text_17">
-    <!-- L=3 -->
-    <g style="fill: #ff0000" transform="translate(318.963638 198.045257) scale(0.1 -0.1)">
-     <defs>
-      <path id="DejaVuSans-3d" d="M 678 2906 
-L 4684 2906 
-L 4684 2381 
-L 678 2381 
-L 678 2906 
-z
-M 678 1631 
-L 4684 1631 
-L 4684 1100 
-L 678 1100 
-L 678 1631 
-z
-" transform="scale(0.015625)"/>
-     </defs>
-     <use xlink:href="#DejaVuSans-4c"/>
-     <use xlink:href="#DejaVuSans-3d" x="55.712891"/>
-     <use xlink:href="#DejaVuSans-33" x="139.501953"/>
-    </g>
-   </g>
-   <g id="text_18">
-    <!-- L=5 -->
-    <g style="fill: #ff0000" transform="translate(228.688766 138.598199) scale(0.1 -0.1)">
-     <use xlink:href="#DejaVuSans-4c"/>
-     <use xlink:href="#DejaVuSans-3d" x="55.712891"/>
-     <use xlink:href="#DejaVuSans-35" x="139.501953"/>
-    </g>
-   </g>
-   <g id="text_19">
-    <!-- L=7 -->
-    <g style="fill: #ff0000" transform="translate(171.91046 67.261728) scale(0.1 -0.1)">
-     <use xlink:href="#DejaVuSans-4c"/>
-     <use xlink:href="#DejaVuSans-3d" x="55.712891"/>
-     <use xlink:href="#DejaVuSans-37" x="139.501953"/>
-    </g>
-   </g>
-   <g id="text_20">
-    <!-- L=9 -->
-    <g style="fill: #ff0000" transform="translate(145.090048 37.538199) scale(0.1 -0.1)">
-     <defs>
-      <path id="DejaVuSans-39" d="M 703 97 
-L 703 672 
-Q 941 559 1184 500 
-Q 1428 441 1663 441 
-Q 2288 441 2617 861 
-Q 2947 1281 2994 2138 
-Q 2813 1869 2534 1725 
-Q 2256 1581 1919 1581 
-Q 1219 1581 811 2004 
-Q 403 2428 403 3163 
-Q 403 3881 828 4315 
-Q 1253 4750 1959 4750 
-Q 2769 4750 3195 4129 
-Q 3622 3509 3622 2328 
-Q 3622 1225 3098 567 
-Q 2575 -91 1691 -91 
-Q 1453 -91 1209 -44 
-Q 966 3 703 97 
-z
-M 1959 2075 
-Q 2384 2075 2632 2365 
-Q 2881 2656 2881 3163 
-Q 2881 3666 2632 3958 
-Q 2384 4250 1959 4250 
-Q 1534 4250 1286 3958 
-Q 1038 3666 1038 3163 
-Q 1038 2656 1286 2365 
-Q 1534 2075 1959 2075 
-z
-" transform="scale(0.015625)"/>
-     </defs>
-     <use xlink:href="#DejaVuSans-4c"/>
-     <use xlink:href="#DejaVuSans-3d" x="55.712891"/>
-     <use xlink:href="#DejaVuSans-39" x="139.501953"/>
-    </g>
-   </g>
-   <g id="text_21">
-    <!-- fixed-depth -->
-    <g style="fill: #ff0000" transform="translate(225.255342 166.790662) scale(0.12 -0.12)">
-     <defs>
-      <path id="DejaVuSans-66" d="M 2375 4863 
-L 2375 4384 
-L 1825 4384 
-Q 1516 4384 1395 4259 
-Q 1275 4134 1275 3809 
-L 1275 3500 
-L 2222 3500 
-L 2222 3053 
-L 1275 3053 
-L 1275 0 
-L 697 0 
-L 697 3053 
-L 147 3053 
-L 147 3500 
-L 697 3500 
-L 697 3744 
-Q 697 4328 969 4595 
-Q 1241 4863 1831 4863 
-L 2375 4863 
-z
-" transform="scale(0.015625)"/>
-      <path id="DejaVuSans-78" d="M 3513 3500 
-L 2247 1797 
-L 3578 0 
-L 2900 0 
-L 1881 1375 
-L 863 0 
-L 184 0 
-L 1544 1831 
-L 300 3500 
-L 978 3500 
-L 1906 2253 
-L 2834 3500 
-L 3513 3500 
-z
-" transform="scale(0.015625)"/>
-      <path id="DejaVuSans-2d" d="M 313 2009 
-L 1997 2009 
-L 1997 1497 
-L 313 1497 
-L 313 2009 
-z
-" transform="scale(0.015625)"/>
-     </defs>
-     <use xlink:href="#DejaVuSans-66"/>
-     <use xlink:href="#DejaVuSans-69" x="35.205078"/>
-     <use xlink:href="#DejaVuSans-78" x="62.988281"/>
-     <use xlink:href="#DejaVuSans-65" x="119.042969"/>
-     <use xlink:href="#DejaVuSans-64" x="180.566406"/>
-     <use xlink:href="#DejaVuSans-2d" x="244.042969"/>
-     <use xlink:href="#DejaVuSans-64" x="280.126953"/>
-     <use xlink:href="#DejaVuSans-65" x="343.603516"/>
-     <use xlink:href="#DejaVuSans-70" x="405.126953"/>
-     <use xlink:href="#DejaVuSans-74" x="468.603516"/>
-     <use xlink:href="#DejaVuSans-68" x="507.8125"/>
-    </g>
-   </g>
-   <g id="text_22">
-    <!-- adaptive -->
-    <g style="fill: #ff0000" transform="translate(283.083817 125.177721) scale(0.12 -0.12)">
-     <use xlink:href="#DejaVuSans-61"/>
-     <use xlink:href="#DejaVuSans-64" x="61.279297"/>
-     <use xlink:href="#DejaVuSans-61" x="124.755859"/>
-     <use xlink:href="#DejaVuSans-70" x="186.035156"/>
-     <use xlink:href="#DejaVuSans-74" x="249.511719"/>
-     <use xlink:href="#DejaVuSans-69" x="288.720703"/>
-     <use xlink:href="#DejaVuSans-76" x="316.503906"/>
-     <use xlink:href="#DejaVuSans-65" x="375.683594"/>
-    </g>
-   </g>
-   <g id="text_23">
-    <!-- optimized -->
-    <g style="fill: #ff0000" transform="translate(260.043244 64.675772) scale(0.12 -0.12)">
-     <defs>
-      <path id="DejaVuSans-7a" d="M 353 3500 
-L 3084 3500 
-L 3084 2975 
-L 922 459 
-L 3084 459 
-L 3084 0 
-L 275 0 
-L 275 525 
-L 2438 3041 
-L 353 3041 
-L 353 3500 
-z
-" transform="scale(0.015625)"/>
-     </defs>
-     <use xlink:href="#DejaVuSans-6f"/>
-     <use xlink:href="#DejaVuSans-70" x="61.181641"/>
-     <use xlink:href="#DejaVuSans-74" x="124.658203"/>
-     <use xlink:href="#DejaVuSans-69" x="163.867188"/>
-     <use xlink:href="#DejaVuSans-6d" x="191.650391"/>
-     <use xlink:href="#DejaVuSans-69" x="289.0625"/>
-     <use xlink:href="#DejaVuSans-7a" x="316.845703"/>
-     <use xlink:href="#DejaVuSans-65" x="369.335938"/>
-     <use xlink:href="#DejaVuSans-64" x="430.859375"/>
-    </g>
-   </g>
-   <g id="text_24">
-    <!-- LightGlue -->
-    <g style="fill: #ff0000" transform="translate(253.72379 21.69671) scale(0.15 -0.15)">
-     <use xlink:href="#DejaVuSans-4c"/>
-     <use xlink:href="#DejaVuSans-69" x="55.712891"/>
-     <use xlink:href="#DejaVuSans-67" x="83.496094"/>
-     <use xlink:href="#DejaVuSans-68" x="146.972656"/>
-     <use xlink:href="#DejaVuSans-74" x="210.351562"/>
-     <use xlink:href="#DejaVuSans-47" x="249.560547"/>
-     <use xlink:href="#DejaVuSans-6c" x="327.050781"/>
-     <use xlink:href="#DejaVuSans-75" x="354.833984"/>
-     <use xlink:href="#DejaVuSans-65" x="418.212891"/>
-    </g>
-   </g>
-   <g id="line2d_11">
-    <path d="M 337.2777 184.285882 
-L 247.002828 124.838824 
-L 190.224522 53.502353 
-L 163.40411 23.778824 
-" clip-path="url(#pb46ed2897c)" style="fill: none; stroke: #ff0000; stroke-width: 2; stroke-linecap: square"/>
-    <defs>
-     <path id="m8759e5a643" d="M 0 3 
-C 0.795609 3 1.55874 2.683901 2.12132 2.12132 
-C 2.683901 1.55874 3 0.795609 3 0 
-C 3 -0.795609 2.683901 -1.55874 2.12132 -2.12132 
-C 1.55874 -2.683901 0.795609 -3 0 -3 
-C -0.795609 -3 -1.55874 -2.683901 -2.12132 -2.12132 
-C -2.683901 -1.55874 -3 -0.795609 -3 0 
-C -3 0.795609 -2.683901 1.55874 -2.12132 2.12132 
-C -1.55874 2.683901 -0.795609 3 0 3 
-z
-" style="stroke: #ff0000"/>
-    </defs>
-    <g clip-path="url(#pb46ed2897c)">
-     <use xlink:href="#m8759e5a643" x="337.2777" y="184.285882" style="fill: #ff0000; stroke: #ff0000"/>
-     <use xlink:href="#m8759e5a643" x="247.002828" y="124.838824" style="fill: #ff0000; stroke: #ff0000"/>
-     <use xlink:href="#m8759e5a643" x="190.224522" y="53.502353" style="fill: #ff0000; stroke: #ff0000"/>
-     <use xlink:href="#m8759e5a643" x="163.40411" y="23.778824" style="fill: #ff0000; stroke: #ff0000"/>
-    </g>
-   </g>
-   <g id="line2d_12">
-    <path d="M 296.754196 112.949412 
-L 241.630312 71.336471 
-L 214.425531 47.557647 
-L 194.077595 29.723529 
-L 163.121578 23.778824 
-" clip-path="url(#pb46ed2897c)" style="fill: none; stroke-dasharray: 7.4,3.2; stroke-dashoffset: 0; stroke: #ff0000; stroke-width: 2"/>
-   </g>
-  </g>
- </g>
- <defs>
-  <clipPath id="pb46ed2897c">
-   <rect x="38.242188" y="0" width="313.259373" height="202.12"/>
-  </clipPath>
- </defs>
-</svg>

+ 0 - 255
python/LightGlue/benchmark.py

@@ -1,255 +0,0 @@
-# Benchmark script for LightGlue on real images
-import argparse
-import time
-from collections import defaultdict
-from pathlib import Path
-
-import matplotlib.pyplot as plt
-import numpy as np
-import torch
-import torch._dynamo
-
-from lightglue import LightGlue, SuperPoint
-from lightglue.utils import load_image
-
-torch.set_grad_enabled(False)
-
-
-def measure(matcher, data, device="cuda", r=100):
-    timings = np.zeros((r, 1))
-    if device.type == "cuda":
-        starter = torch.cuda.Event(enable_timing=True)
-        ender = torch.cuda.Event(enable_timing=True)
-    # warmup
-    for _ in range(10):
-        _ = matcher(data)
-    # measurements
-    with torch.no_grad():
-        for rep in range(r):
-            if device.type == "cuda":
-                starter.record()
-                _ = matcher(data)
-                ender.record()
-                # sync gpu
-                torch.cuda.synchronize()
-                curr_time = starter.elapsed_time(ender)
-            else:
-                start = time.perf_counter()
-                _ = matcher(data)
-                curr_time = (time.perf_counter() - start) * 1e3
-            timings[rep] = curr_time
-    mean_syn = np.sum(timings) / r
-    std_syn = np.std(timings)
-    return {"mean": mean_syn, "std": std_syn}
-
-
-def print_as_table(d, title, cnames):
-    print()
-    header = f"{title:30} " + " ".join([f"{x:>7}" for x in cnames])
-    print(header)
-    print("-" * len(header))
-    for k, l in d.items():
-        print(f"{k:30}", " ".join([f"{x:>7.1f}" for x in l]))
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="Benchmark script for LightGlue")
-    parser.add_argument(
-        "--device",
-        choices=["auto", "cuda", "cpu", "mps"],
-        default="auto",
-        help="device to benchmark on",
-    )
-    parser.add_argument("--compile", action="store_true", help="Compile LightGlue runs")
-    parser.add_argument(
-        "--no_flash", action="store_true", help="disable FlashAttention"
-    )
-    parser.add_argument(
-        "--no_prune_thresholds",
-        action="store_true",
-        help="disable pruning thresholds (i.e. always do pruning)",
-    )
-    parser.add_argument(
-        "--add_superglue",
-        action="store_true",
-        help="add SuperGlue to the benchmark (requires hloc)",
-    )
-    parser.add_argument(
-        "--measure", default="time", choices=["time", "log-time", "throughput"]
-    )
-    parser.add_argument(
-        "--repeat", "--r", type=int, default=100, help="repetitions of measurements"
-    )
-    parser.add_argument(
-        "--num_keypoints",
-        nargs="+",
-        type=int,
-        default=[256, 512, 1024, 2048, 4096],
-        help="number of keypoints (list separated by spaces)",
-    )
-    parser.add_argument(
-        "--matmul_precision", default="highest", choices=["highest", "high", "medium"]
-    )
-    parser.add_argument(
-        "--save", default=None, type=str, help="path where figure should be saved"
-    )
-    args = parser.parse_intermixed_args()
-
-    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-    if args.device != "auto":
-        device = torch.device(args.device)
-
-    print("Running benchmark on device:", device)
-
-    images = Path("assets")
-    inputs = {
-        "easy": (
-            load_image(images / "DSC_0411.JPG"),
-            load_image(images / "DSC_0410.JPG"),
-        ),
-        "difficult": (
-            load_image(images / "sacre_coeur1.jpg"),
-            load_image(images / "sacre_coeur2.jpg"),
-        ),
-    }
-
-    configs = {
-        "LightGlue-full": {
-            "depth_confidence": -1,
-            "width_confidence": -1,
-        },
-        # 'LG-prune': {
-        #     'width_confidence': -1,
-        # },
-        # 'LG-depth': {
-        #     'depth_confidence': -1,
-        # },
-        "LightGlue-adaptive": {},
-    }
-
-    if args.compile:
-        configs = {**configs, **{k + "-compile": v for k, v in configs.items()}}
-
-    sg_configs = {
-        # 'SuperGlue': {},
-        "SuperGlue-fast": {"sinkhorn_iterations": 5}
-    }
-
-    torch.set_float32_matmul_precision(args.matmul_precision)
-
-    results = {k: defaultdict(list) for k, v in inputs.items()}
-
-    extractor = SuperPoint(max_num_keypoints=None, detection_threshold=-1)
-    extractor = extractor.eval().to(device)
-    figsize = (len(inputs) * 4.5, 4.5)
-    fig, axes = plt.subplots(1, len(inputs), sharey=True, figsize=figsize)
-    axes = axes if len(inputs) > 1 else [axes]
-    fig.canvas.manager.set_window_title(f"LightGlue benchmark ({device.type})")
-
-    for title, ax in zip(inputs.keys(), axes):
-        ax.set_xscale("log", base=2)
-        bases = [2**x for x in range(7, 16)]
-        ax.set_xticks(bases, bases)
-        ax.grid(which="major")
-        if args.measure == "log-time":
-            ax.set_yscale("log")
-            yticks = [10**x for x in range(6)]
-            ax.set_yticks(yticks, yticks)
-            mpos = [10**x * i for x in range(6) for i in range(2, 10)]
-            mlabel = [
-                10**x * i if i in [2, 5] else None
-                for x in range(6)
-                for i in range(2, 10)
-            ]
-            ax.set_yticks(mpos, mlabel, minor=True)
-            ax.grid(which="minor", linewidth=0.2)
-        ax.set_title(title)
-
-        ax.set_xlabel("# keypoints")
-        if args.measure == "throughput":
-            ax.set_ylabel("Throughput [pairs/s]")
-        else:
-            ax.set_ylabel("Latency [ms]")
-
-    for name, conf in configs.items():
-        print("Run benchmark for:", name)
-        torch.cuda.empty_cache()
-        matcher = LightGlue(features="superpoint", flash=not args.no_flash, **conf)
-        if args.no_prune_thresholds:
-            matcher.pruning_keypoint_thresholds = {
-                k: -1 for k in matcher.pruning_keypoint_thresholds
-            }
-        matcher = matcher.eval().to(device)
-        if name.endswith("compile"):
-            import torch._dynamo
-
-            torch._dynamo.reset()  # avoid buffer overflow
-            matcher.compile()
-        for pair_name, ax in zip(inputs.keys(), axes):
-            image0, image1 = [x.to(device) for x in inputs[pair_name]]
-            runtimes = []
-            for num_kpts in args.num_keypoints:
-                extractor.conf.max_num_keypoints = num_kpts
-                feats0 = extractor.extract(image0)
-                feats1 = extractor.extract(image1)
-                runtime = measure(
-                    matcher,
-                    {"image0": feats0, "image1": feats1},
-                    device=device,
-                    r=args.repeat,
-                )["mean"]
-                results[pair_name][name].append(
-                    1000 / runtime if args.measure == "throughput" else runtime
-                )
-            ax.plot(
-                args.num_keypoints, results[pair_name][name], label=name, marker="o"
-            )
-        del matcher, feats0, feats1
-
-    if args.add_superglue:
-        from hloc.matchers.superglue import SuperGlue
-
-        for name, conf in sg_configs.items():
-            print("Run benchmark for:", name)
-            matcher = SuperGlue(conf)
-            matcher = matcher.eval().to(device)
-            for pair_name, ax in zip(inputs.keys(), axes):
-                image0, image1 = [x.to(device) for x in inputs[pair_name]]
-                runtimes = []
-                for num_kpts in args.num_keypoints:
-                    extractor.conf.max_num_keypoints = num_kpts
-                    feats0 = extractor.extract(image0)
-                    feats1 = extractor.extract(image1)
-                    data = {
-                        "image0": image0[None],
-                        "image1": image1[None],
-                        **{k + "0": v for k, v in feats0.items()},
-                        **{k + "1": v for k, v in feats1.items()},
-                    }
-                    data["scores0"] = data["keypoint_scores0"]
-                    data["scores1"] = data["keypoint_scores1"]
-                    data["descriptors0"] = (
-                        data["descriptors0"].transpose(-1, -2).contiguous()
-                    )
-                    data["descriptors1"] = (
-                        data["descriptors1"].transpose(-1, -2).contiguous()
-                    )
-                    runtime = measure(matcher, data, device=device, r=args.repeat)[
-                        "mean"
-                    ]
-                    results[pair_name][name].append(
-                        1000 / runtime if args.measure == "throughput" else runtime
-                    )
-                ax.plot(
-                    args.num_keypoints, results[pair_name][name], label=name, marker="o"
-                )
-            del matcher, data, image0, image1, feats0, feats1
-
-    for name, runtimes in results.items():
-        print_as_table(runtimes, name, args.num_keypoints)
-
-    axes[0].legend()
-    fig.tight_layout()
-    if args.save:
-        plt.savefig(args.save, dpi=fig.dpi)
-    plt.show()

Rozdílová data souboru nebyla zobrazena, protože soubor je příliš velký
+ 0 - 77
python/LightGlue/demo.ipynb


+ 0 - 7
python/LightGlue/lightglue/__init__.py

@@ -1,7 +0,0 @@
-from .aliked import ALIKED  # noqa
-from .disk import DISK  # noqa
-from .dog_hardnet import DoGHardNet  # noqa
-from .lightglue import LightGlue  # noqa
-from .sift import SIFT  # noqa
-from .superpoint import SuperPoint  # noqa
-from .utils import match_pair  # noqa

+ 0 - 775
python/LightGlue/lightglue/aliked.py

@@ -1,775 +0,0 @@
-# BSD 3-Clause License
-
-# Copyright (c) 2022, Zhao Xiaoming
-# All rights reserved.
-
-# Redistribution and use in source and binary forms, with or without
-# modification, are permitted provided that the following conditions are met:
-
-# 1. Redistributions of source code must retain the above copyright notice, this
-#    list of conditions and the following disclaimer.
-
-# 2. Redistributions in binary form must reproduce the above copyright notice,
-#    this list of conditions and the following disclaimer in the documentation
-#    and/or other materials provided with the distribution.
-
-# 3. Neither the name of the copyright holder nor the names of its
-#    contributors may be used to endorse or promote products derived from
-#    this software without specific prior written permission.
-
-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-# Authors:
-# Xiaoming Zhao, Xingming Wu, Weihai Chen, Peter C.Y. Chen, Qingsong Xu, and Zhengguo Li
-# Code from https://github.com/Shiaoming/ALIKED
-
-from typing import Callable, Optional
-
-import torch
-import torch.nn.functional as F
-import torchvision
-from kornia.color import grayscale_to_rgb
-from torch import nn
-from torch.nn.modules.utils import _pair
-from torchvision.models import resnet
-
-from .utils import Extractor, ImagePreprocessor
-
-
-def get_patches(
-    tensor: torch.Tensor, required_corners: torch.Tensor, ps: int
-) -> torch.Tensor:
-    c, h, w = tensor.shape
-    corner = (required_corners - ps / 2 + 1).long()
-    corner[:, 0] = corner[:, 0].clamp(min=0, max=w - 1 - ps)
-    corner[:, 1] = corner[:, 1].clamp(min=0, max=h - 1 - ps)
-    offset = torch.arange(0, ps)
-
-    kw = {"indexing": "ij"} if torch.__version__ >= "1.10" else {}
-    x, y = torch.meshgrid(offset, offset, **kw)
-    patches = torch.stack((x, y)).permute(2, 1, 0).unsqueeze(2)
-    patches = patches.to(corner) + corner[None, None]
-    pts = patches.reshape(-1, 2)
-    sampled = tensor.permute(1, 2, 0)[tuple(pts.T)[::-1]]
-    sampled = sampled.reshape(ps, ps, -1, c)
-    assert sampled.shape[:3] == patches.shape[:3]
-    return sampled.permute(2, 3, 0, 1)
-
-
-def simple_nms(scores: torch.Tensor, nms_radius: int):
-    """Fast Non-maximum suppression to remove nearby points"""
-
-    zeros = torch.zeros_like(scores)
-    max_mask = scores == torch.nn.functional.max_pool2d(
-        scores, kernel_size=nms_radius * 2 + 1, stride=1, padding=nms_radius
-    )
-
-    for _ in range(2):
-        supp_mask = (
-            torch.nn.functional.max_pool2d(
-                max_mask.float(),
-                kernel_size=nms_radius * 2 + 1,
-                stride=1,
-                padding=nms_radius,
-            )
-            > 0
-        )
-        supp_scores = torch.where(supp_mask, zeros, scores)
-        new_max_mask = supp_scores == torch.nn.functional.max_pool2d(
-            supp_scores, kernel_size=nms_radius * 2 + 1, stride=1, padding=nms_radius
-        )
-        max_mask = max_mask | (new_max_mask & (~supp_mask))
-    return torch.where(max_mask, scores, zeros)
-
-
-class DKD(nn.Module):
-    def __init__(
-        self,
-        radius: int = 2,
-        top_k: int = 0,
-        scores_th: float = 0.2,
-        n_limit: int = 20000,
-    ):
-        """
-        Args:
-            radius: soft detection radius, kernel size is (2 * radius + 1)
-            top_k: top_k > 0: return top k keypoints
-            scores_th: top_k <= 0 threshold mode:
-                scores_th > 0: return keypoints with scores>scores_th
-                else: return keypoints with scores > scores.mean()
-            n_limit: max number of keypoint in threshold mode
-        """
-        super().__init__()
-        self.radius = radius
-        self.top_k = top_k
-        self.scores_th = scores_th
-        self.n_limit = n_limit
-        self.kernel_size = 2 * self.radius + 1
-        self.temperature = 0.1  # tuned temperature
-        self.unfold = nn.Unfold(kernel_size=self.kernel_size, padding=self.radius)
-        # local xy grid
-        x = torch.linspace(-self.radius, self.radius, self.kernel_size)
-        # (kernel_size*kernel_size) x 2 : (w,h)
-        kw = {"indexing": "ij"} if torch.__version__ >= "1.10" else {}
-        self.hw_grid = (
-            torch.stack(torch.meshgrid([x, x], **kw)).view(2, -1).t()[:, [1, 0]]
-        )
-
-    def forward(
-        self,
-        scores_map: torch.Tensor,
-        sub_pixel: bool = True,
-        image_size: Optional[torch.Tensor] = None,
-    ):
-        """
-        :param scores_map: Bx1xHxW
-        :param descriptor_map: BxCxHxW
-        :param sub_pixel: whether to use sub-pixel keypoint detection
-        :return: kpts: list[Nx2,...]; kptscores: list[N,....] normalised position: -1~1
-        """
-        b, c, h, w = scores_map.shape
-        scores_nograd = scores_map.detach()
-        nms_scores = simple_nms(scores_nograd, self.radius)
-
-        # remove border
-        nms_scores[:, :, : self.radius, :] = 0
-        nms_scores[:, :, :, : self.radius] = 0
-        if image_size is not None:
-            for i in range(scores_map.shape[0]):
-                w, h = image_size[i].long()
-                nms_scores[i, :, h.item() - self.radius :, :] = 0
-                nms_scores[i, :, :, w.item() - self.radius :] = 0
-        else:
-            nms_scores[:, :, -self.radius :, :] = 0
-            nms_scores[:, :, :, -self.radius :] = 0
-
-        # detect keypoints without grad
-        if self.top_k > 0:
-            topk = torch.topk(nms_scores.view(b, -1), self.top_k)
-            indices_keypoints = [topk.indices[i] for i in range(b)]  # B x top_k
-        else:
-            if self.scores_th > 0:
-                masks = nms_scores > self.scores_th
-                if masks.sum() == 0:
-                    th = scores_nograd.reshape(b, -1).mean(dim=1)  # th = self.scores_th
-                    masks = nms_scores > th.reshape(b, 1, 1, 1)
-            else:
-                th = scores_nograd.reshape(b, -1).mean(dim=1)  # th = self.scores_th
-                masks = nms_scores > th.reshape(b, 1, 1, 1)
-            masks = masks.reshape(b, -1)
-
-            indices_keypoints = []  # list, B x (any size)
-            scores_view = scores_nograd.reshape(b, -1)
-            for mask, scores in zip(masks, scores_view):
-                indices = mask.nonzero()[:, 0]
-                if len(indices) > self.n_limit:
-                    kpts_sc = scores[indices]
-                    sort_idx = kpts_sc.sort(descending=True)[1]
-                    sel_idx = sort_idx[: self.n_limit]
-                    indices = indices[sel_idx]
-                indices_keypoints.append(indices)
-
-        wh = torch.tensor([w - 1, h - 1], device=scores_nograd.device)
-
-        keypoints = []
-        scoredispersitys = []
-        kptscores = []
-        if sub_pixel:
-            # detect soft keypoints with grad backpropagation
-            patches = self.unfold(scores_map)  # B x (kernel**2) x (H*W)
-            self.hw_grid = self.hw_grid.to(scores_map)  # to device
-            for b_idx in range(b):
-                patch = patches[b_idx].t()  # (H*W) x (kernel**2)
-                indices_kpt = indices_keypoints[
-                    b_idx
-                ]  # one dimension vector, say its size is M
-                patch_scores = patch[indices_kpt]  # M x (kernel**2)
-                keypoints_xy_nms = torch.stack(
-                    [indices_kpt % w, torch.div(indices_kpt, w, rounding_mode="trunc")],
-                    dim=1,
-                )  # Mx2
-
-                # max is detached to prevent undesired backprop loops in the graph
-                max_v = patch_scores.max(dim=1).values.detach()[:, None]
-                x_exp = (
-                    (patch_scores - max_v) / self.temperature
-                ).exp()  # M * (kernel**2), in [0, 1]
-
-                # \frac{ \sum{(i,j) \times \exp(x/T)} }{ \sum{\exp(x/T)} }
-                xy_residual = (
-                    x_exp @ self.hw_grid / x_exp.sum(dim=1)[:, None]
-                )  # Soft-argmax, Mx2
-
-                hw_grid_dist2 = (
-                    torch.norm(
-                        (self.hw_grid[None, :, :] - xy_residual[:, None, :])
-                        / self.radius,
-                        dim=-1,
-                    )
-                    ** 2
-                )
-                scoredispersity = (x_exp * hw_grid_dist2).sum(dim=1) / x_exp.sum(dim=1)
-
-                # compute result keypoints
-                keypoints_xy = keypoints_xy_nms + xy_residual
-                keypoints_xy = keypoints_xy / wh * 2 - 1  # (w,h) -> (-1~1,-1~1)
-
-                kptscore = torch.nn.functional.grid_sample(
-                    scores_map[b_idx].unsqueeze(0),
-                    keypoints_xy.view(1, 1, -1, 2),
-                    mode="bilinear",
-                    align_corners=True,
-                )[
-                    0, 0, 0, :
-                ]  # CxN
-
-                keypoints.append(keypoints_xy)
-                scoredispersitys.append(scoredispersity)
-                kptscores.append(kptscore)
-        else:
-            for b_idx in range(b):
-                indices_kpt = indices_keypoints[
-                    b_idx
-                ]  # one dimension vector, say its size is M
-                # To avoid warning: UserWarning: __floordiv__ is deprecated
-                keypoints_xy_nms = torch.stack(
-                    [indices_kpt % w, torch.div(indices_kpt, w, rounding_mode="trunc")],
-                    dim=1,
-                )  # Mx2
-                keypoints_xy = keypoints_xy_nms / wh * 2 - 1  # (w,h) -> (-1~1,-1~1)
-                kptscore = torch.nn.functional.grid_sample(
-                    scores_map[b_idx].unsqueeze(0),
-                    keypoints_xy.view(1, 1, -1, 2),
-                    mode="bilinear",
-                    align_corners=True,
-                )[
-                    0, 0, 0, :
-                ]  # CxN
-                keypoints.append(keypoints_xy)
-                scoredispersitys.append(kptscore)  # for jit.script compatability
-                kptscores.append(kptscore)
-
-        return keypoints, kptscores, scoredispersitys
-
-
-class InputPadder(object):
-    """Pads images such that dimensions are divisible by 8"""
-
-    def __init__(self, h: int, w: int, divis_by: int = 8):
-        self.ht = h
-        self.wd = w
-        pad_ht = (((self.ht // divis_by) + 1) * divis_by - self.ht) % divis_by
-        pad_wd = (((self.wd // divis_by) + 1) * divis_by - self.wd) % divis_by
-        self._pad = [
-            pad_wd // 2,
-            pad_wd - pad_wd // 2,
-            pad_ht // 2,
-            pad_ht - pad_ht // 2,
-        ]
-
-    def pad(self, x: torch.Tensor):
-        assert x.ndim == 4
-        return F.pad(x, self._pad, mode="replicate")
-
-    def unpad(self, x: torch.Tensor):
-        assert x.ndim == 4
-        ht = x.shape[-2]
-        wd = x.shape[-1]
-        c = [self._pad[2], ht - self._pad[3], self._pad[0], wd - self._pad[1]]
-        return x[..., c[0] : c[1], c[2] : c[3]]
-
-
-class DeformableConv2d(nn.Module):
-    def __init__(
-        self,
-        in_channels,
-        out_channels,
-        kernel_size=3,
-        stride=1,
-        padding=1,
-        bias=False,
-        mask=False,
-    ):
-        super(DeformableConv2d, self).__init__()
-
-        self.padding = padding
-        self.mask = mask
-
-        self.channel_num = (
-            3 * kernel_size * kernel_size if mask else 2 * kernel_size * kernel_size
-        )
-        self.offset_conv = nn.Conv2d(
-            in_channels,
-            self.channel_num,
-            kernel_size=kernel_size,
-            stride=stride,
-            padding=self.padding,
-            bias=True,
-        )
-
-        self.regular_conv = nn.Conv2d(
-            in_channels=in_channels,
-            out_channels=out_channels,
-            kernel_size=kernel_size,
-            stride=stride,
-            padding=self.padding,
-            bias=bias,
-        )
-
-    def forward(self, x):
-        h, w = x.shape[2:]
-        max_offset = max(h, w) / 4.0
-
-        out = self.offset_conv(x)
-        if self.mask:
-            o1, o2, mask = torch.chunk(out, 3, dim=1)
-            offset = torch.cat((o1, o2), dim=1)
-            mask = torch.sigmoid(mask)
-        else:
-            offset = out
-            mask = None
-        offset = offset.clamp(-max_offset, max_offset)
-        x = torchvision.ops.deform_conv2d(
-            input=x,
-            offset=offset,
-            weight=self.regular_conv.weight,
-            bias=self.regular_conv.bias,
-            padding=self.padding,
-            mask=mask,
-        )
-        return x
-
-
-def get_conv(
-    inplanes,
-    planes,
-    kernel_size=3,
-    stride=1,
-    padding=1,
-    bias=False,
-    conv_type="conv",
-    mask=False,
-):
-    if conv_type == "conv":
-        conv = nn.Conv2d(
-            inplanes,
-            planes,
-            kernel_size=kernel_size,
-            stride=stride,
-            padding=padding,
-            bias=bias,
-        )
-    elif conv_type == "dcn":
-        conv = DeformableConv2d(
-            inplanes,
-            planes,
-            kernel_size=kernel_size,
-            stride=stride,
-            padding=_pair(padding),
-            bias=bias,
-            mask=mask,
-        )
-    else:
-        raise TypeError
-    return conv
-
-
-class ConvBlock(nn.Module):
-    def __init__(
-        self,
-        in_channels,
-        out_channels,
-        gate: Optional[Callable[..., nn.Module]] = None,
-        norm_layer: Optional[Callable[..., nn.Module]] = None,
-        conv_type: str = "conv",
-        mask: bool = False,
-    ):
-        super().__init__()
-        if gate is None:
-            self.gate = nn.ReLU(inplace=True)
-        else:
-            self.gate = gate
-        if norm_layer is None:
-            norm_layer = nn.BatchNorm2d
-        self.conv1 = get_conv(
-            in_channels, out_channels, kernel_size=3, conv_type=conv_type, mask=mask
-        )
-        self.bn1 = norm_layer(out_channels)
-        self.conv2 = get_conv(
-            out_channels, out_channels, kernel_size=3, conv_type=conv_type, mask=mask
-        )
-        self.bn2 = norm_layer(out_channels)
-
-    def forward(self, x):
-        x = self.gate(self.bn1(self.conv1(x)))  # B x in_channels x H x W
-        x = self.gate(self.bn2(self.conv2(x)))  # B x out_channels x H x W
-        return x
-
-
-# modified based on torchvision\models\resnet.py#27->BasicBlock
-class ResBlock(nn.Module):
-    expansion: int = 1
-
-    def __init__(
-        self,
-        inplanes: int,
-        planes: int,
-        stride: int = 1,
-        downsample: Optional[nn.Module] = None,
-        groups: int = 1,
-        base_width: int = 64,
-        dilation: int = 1,
-        gate: Optional[Callable[..., nn.Module]] = None,
-        norm_layer: Optional[Callable[..., nn.Module]] = None,
-        conv_type: str = "conv",
-        mask: bool = False,
-    ) -> None:
-        super(ResBlock, self).__init__()
-        if gate is None:
-            self.gate = nn.ReLU(inplace=True)
-        else:
-            self.gate = gate
-        if norm_layer is None:
-            norm_layer = nn.BatchNorm2d
-        if groups != 1 or base_width != 64:
-            raise ValueError("ResBlock only supports groups=1 and base_width=64")
-        if dilation > 1:
-            raise NotImplementedError("Dilation > 1 not supported in ResBlock")
-        # Both self.conv1 and self.downsample layers
-        # downsample the input when stride != 1
-        self.conv1 = get_conv(
-            inplanes, planes, kernel_size=3, conv_type=conv_type, mask=mask
-        )
-        self.bn1 = norm_layer(planes)
-        self.conv2 = get_conv(
-            planes, planes, kernel_size=3, conv_type=conv_type, mask=mask
-        )
-        self.bn2 = norm_layer(planes)
-        self.downsample = downsample
-        self.stride = stride
-
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        identity = x
-
-        out = self.conv1(x)
-        out = self.bn1(out)
-        out = self.gate(out)
-
-        out = self.conv2(out)
-        out = self.bn2(out)
-
-        if self.downsample is not None:
-            identity = self.downsample(x)
-
-        out += identity
-        out = self.gate(out)
-
-        return out
-
-
-class SDDH(nn.Module):
-    def __init__(
-        self,
-        dims: int,
-        kernel_size: int = 3,
-        n_pos: int = 8,
-        gate=nn.ReLU(),
-        conv2D=False,
-        mask=False,
-    ):
-        super(SDDH, self).__init__()
-        self.kernel_size = kernel_size
-        self.n_pos = n_pos
-        self.conv2D = conv2D
-        self.mask = mask
-
-        self.get_patches_func = get_patches
-
-        # estimate offsets
-        self.channel_num = 3 * n_pos if mask else 2 * n_pos
-        self.offset_conv = nn.Sequential(
-            nn.Conv2d(
-                dims,
-                self.channel_num,
-                kernel_size=kernel_size,
-                stride=1,
-                padding=0,
-                bias=True,
-            ),
-            gate,
-            nn.Conv2d(
-                self.channel_num,
-                self.channel_num,
-                kernel_size=1,
-                stride=1,
-                padding=0,
-                bias=True,
-            ),
-        )
-
-        # sampled feature conv
-        self.sf_conv = nn.Conv2d(
-            dims, dims, kernel_size=1, stride=1, padding=0, bias=False
-        )
-
-        # convM
-        if not conv2D:
-            # deformable desc weights
-            agg_weights = torch.nn.Parameter(torch.rand(n_pos, dims, dims))
-            self.register_parameter("agg_weights", agg_weights)
-        else:
-            self.convM = nn.Conv2d(
-                dims * n_pos, dims, kernel_size=1, stride=1, padding=0, bias=False
-            )
-
-    def forward(self, x, keypoints):
-        # x: [B,C,H,W]
-        # keypoints: list, [[N_kpts,2], ...] (w,h)
-        b, c, h, w = x.shape
-        wh = torch.tensor([[w - 1, h - 1]], device=x.device)
-        max_offset = max(h, w) / 4.0
-
-        offsets = []
-        descriptors = []
-        # get offsets for each keypoint
-        for ib in range(b):
-            xi, kptsi = x[ib], keypoints[ib]
-            kptsi_wh = (kptsi / 2 + 0.5) * wh
-            N_kpts = len(kptsi)
-
-            if self.kernel_size > 1:
-                patch = self.get_patches_func(
-                    xi, kptsi_wh.long(), self.kernel_size
-                )  # [N_kpts, C, K, K]
-            else:
-                kptsi_wh_long = kptsi_wh.long()
-                patch = (
-                    xi[:, kptsi_wh_long[:, 1], kptsi_wh_long[:, 0]]
-                    .permute(1, 0)
-                    .reshape(N_kpts, c, 1, 1)
-                )
-
-            offset = self.offset_conv(patch).clamp(
-                -max_offset, max_offset
-            )  # [N_kpts, 2*n_pos, 1, 1]
-            if self.mask:
-                offset = (
-                    offset[:, :, 0, 0].view(N_kpts, 3, self.n_pos).permute(0, 2, 1)
-                )  # [N_kpts, n_pos, 3]
-                offset = offset[:, :, :-1]  # [N_kpts, n_pos, 2]
-                mask_weight = torch.sigmoid(offset[:, :, -1])  # [N_kpts, n_pos]
-            else:
-                offset = (
-                    offset[:, :, 0, 0].view(N_kpts, 2, self.n_pos).permute(0, 2, 1)
-                )  # [N_kpts, n_pos, 2]
-            offsets.append(offset)  # for visualization
-
-            # get sample positions
-            pos = kptsi_wh.unsqueeze(1) + offset  # [N_kpts, n_pos, 2]
-            pos = 2.0 * pos / wh[None] - 1
-            pos = pos.reshape(1, N_kpts * self.n_pos, 1, 2)
-
-            # sample features
-            features = F.grid_sample(
-                xi.unsqueeze(0), pos, mode="bilinear", align_corners=True
-            )  # [1,C,(N_kpts*n_pos),1]
-            features = features.reshape(c, N_kpts, self.n_pos, 1).permute(
-                1, 0, 2, 3
-            )  # [N_kpts, C, n_pos, 1]
-            if self.mask:
-                features = torch.einsum("ncpo,np->ncpo", features, mask_weight)
-
-            features = torch.selu_(self.sf_conv(features)).squeeze(
-                -1
-            )  # [N_kpts, C, n_pos]
-            # convM
-            if not self.conv2D:
-                descs = torch.einsum(
-                    "ncp,pcd->nd", features, self.agg_weights
-                )  # [N_kpts, C]
-            else:
-                features = features.reshape(N_kpts, -1)[
-                    :, :, None, None
-                ]  # [N_kpts, C*n_pos, 1, 1]
-                descs = self.convM(features).squeeze()  # [N_kpts, C]
-
-            # normalize
-            descs = F.normalize(descs, p=2.0, dim=1)
-            descriptors.append(descs)
-
-        return descriptors, offsets
-
-
-class ALIKED(Extractor):
-    default_conf = {
-        "model_name": "aliked-n16",
-        "max_num_keypoints": -1,
-        "detection_threshold": 0.2,
-        "nms_radius": 2,
-    }
-
-    checkpoint_url = "https://github.com/Shiaoming/ALIKED/raw/main/models/{}.pth"
-
-    n_limit_max = 20000
-
-    # c1, c2, c3, c4, dim, K, M
-    cfgs = {
-        "aliked-t16": [8, 16, 32, 64, 64, 3, 16],
-        "aliked-n16": [16, 32, 64, 128, 128, 3, 16],
-        "aliked-n16rot": [16, 32, 64, 128, 128, 3, 16],
-        "aliked-n32": [16, 32, 64, 128, 128, 3, 32],
-    }
-    preprocess_conf = {
-        "resize": 1024,
-    }
-
-    required_data_keys = ["image"]
-
-    def __init__(self, **conf):
-        super().__init__(**conf)  # Update with default configuration.
-        conf = self.conf
-        c1, c2, c3, c4, dim, K, M = self.cfgs[conf.model_name]
-        conv_types = ["conv", "conv", "dcn", "dcn"]
-        conv2D = False
-        mask = False
-
-        # build model
-        self.pool2 = nn.AvgPool2d(kernel_size=2, stride=2)
-        self.pool4 = nn.AvgPool2d(kernel_size=4, stride=4)
-        self.norm = nn.BatchNorm2d
-        self.gate = nn.SELU(inplace=True)
-        self.block1 = ConvBlock(3, c1, self.gate, self.norm, conv_type=conv_types[0])
-        self.block2 = self.get_resblock(c1, c2, conv_types[1], mask)
-        self.block3 = self.get_resblock(c2, c3, conv_types[2], mask)
-        self.block4 = self.get_resblock(c3, c4, conv_types[3], mask)
-
-        self.conv1 = resnet.conv1x1(c1, dim // 4)
-        self.conv2 = resnet.conv1x1(c2, dim // 4)
-        self.conv3 = resnet.conv1x1(c3, dim // 4)
-        self.conv4 = resnet.conv1x1(dim, dim // 4)
-        self.upsample2 = nn.Upsample(
-            scale_factor=2, mode="bilinear", align_corners=True
-        )
-        self.upsample4 = nn.Upsample(
-            scale_factor=4, mode="bilinear", align_corners=True
-        )
-        self.upsample8 = nn.Upsample(
-            scale_factor=8, mode="bilinear", align_corners=True
-        )
-        self.upsample32 = nn.Upsample(
-            scale_factor=32, mode="bilinear", align_corners=True
-        )
-        self.score_head = nn.Sequential(
-            resnet.conv1x1(dim, 8),
-            self.gate,
-            resnet.conv3x3(8, 4),
-            self.gate,
-            resnet.conv3x3(4, 4),
-            self.gate,
-            resnet.conv3x3(4, 1),
-        )
-        self.desc_head = SDDH(dim, K, M, gate=self.gate, conv2D=conv2D, mask=mask)
-        self.dkd = DKD(
-            radius=conf.nms_radius,
-            top_k=-1 if conf.detection_threshold > 0 else conf.max_num_keypoints,
-            scores_th=conf.detection_threshold,
-            n_limit=(
-                conf.max_num_keypoints
-                if conf.max_num_keypoints > 0
-                else self.n_limit_max
-            ),
-        )
-
-        state_dict = torch.hub.load_state_dict_from_url(
-            self.checkpoint_url.format(conf.model_name), map_location="cpu"
-        )
-        self.load_state_dict(state_dict, strict=True)
-
-    def get_resblock(self, c_in, c_out, conv_type, mask):
-        return ResBlock(
-            c_in,
-            c_out,
-            1,
-            nn.Conv2d(c_in, c_out, 1),
-            gate=self.gate,
-            norm_layer=self.norm,
-            conv_type=conv_type,
-            mask=mask,
-        )
-
-    def extract_dense_map(self, image):
-        # Pads images such that dimensions are divisible by
-        div_by = 2**5
-        padder = InputPadder(image.shape[-2], image.shape[-1], div_by)
-        image = padder.pad(image)
-
-        # ================================== feature encoder
-        x1 = self.block1(image)  # B x c1 x H x W
-        x2 = self.pool2(x1)
-        x2 = self.block2(x2)  # B x c2 x H/2 x W/2
-        x3 = self.pool4(x2)
-        x3 = self.block3(x3)  # B x c3 x H/8 x W/8
-        x4 = self.pool4(x3)
-        x4 = self.block4(x4)  # B x dim x H/32 x W/32
-        # ================================== feature aggregation
-        x1 = self.gate(self.conv1(x1))  # B x dim//4 x H x W
-        x2 = self.gate(self.conv2(x2))  # B x dim//4 x H//2 x W//2
-        x3 = self.gate(self.conv3(x3))  # B x dim//4 x H//8 x W//8
-        x4 = self.gate(self.conv4(x4))  # B x dim//4 x H//32 x W//32
-        x2_up = self.upsample2(x2)  # B x dim//4 x H x W
-        x3_up = self.upsample8(x3)  # B x dim//4 x H x W
-        x4_up = self.upsample32(x4)  # B x dim//4 x H x W
-        x1234 = torch.cat([x1, x2_up, x3_up, x4_up], dim=1)
-        # ================================== score head
-        score_map = torch.sigmoid(self.score_head(x1234))
-        feature_map = torch.nn.functional.normalize(x1234, p=2, dim=1)
-
-        # Unpads images
-        feature_map = padder.unpad(feature_map)
-        score_map = padder.unpad(score_map)
-
-        return feature_map, score_map
-
-    def describe(
-        self, keypoints: torch.Tensor, img: torch.Tensor, **conf
-    ) -> torch.Tensor:
-        """Extract descriptors for a set of keypoints."""
-        if img.dim() == 3:
-            img = img[None]  # add batch dim
-        assert img.dim() == 4 and img.shape[0] == 1
-        w, h = img.shape[-2:][::-1]
-        wh = torch.tensor([w - 1, h - 1], device=img.device)
-        img, _ = ImagePreprocessor(**{**self.preprocess_conf, **conf})(img)
-        keypoints_n = 2.0 * keypoints / wh[None, None] - 1  # [-1, 1]
-        # Extract dense features on resized img
-        feature_map, _ = self.extract_dense_map(img)
-        return torch.stack(self.desc_head(feature_map, keypoints_n)[0])
-
-    def forward(self, data: dict) -> dict:
-        image = data["image"]
-        if image.shape[1] == 1:
-            image = grayscale_to_rgb(image)
-        feature_map, score_map = self.extract_dense_map(image)
-        keypoints, kptscores, scoredispersitys = self.dkd(
-            score_map, image_size=data.get("image_size")
-        )
-        descriptors, offsets = self.desc_head(feature_map, keypoints)
-
-        _, _, h, w = image.shape
-        wh = torch.tensor([w - 1, h - 1], device=image.device)
-        # no padding required
-        # we can set detection_threshold=-1 and conf.max_num_keypoints > 0
-        return {
-            "keypoints": wh * (torch.stack(keypoints) + 1) / 2.0,  # B x N x 2
-            "descriptors": torch.stack(descriptors),  # B x N x D
-            "keypoint_scores": torch.stack(kptscores),  # B x N
-        }

+ 0 - 55
python/LightGlue/lightglue/disk.py

@@ -1,55 +0,0 @@
-import kornia
-import torch
-
-from .utils import Extractor
-
-
-class DISK(Extractor):
-    default_conf = {
-        "weights": "depth",
-        "max_num_keypoints": None,
-        "desc_dim": 128,
-        "nms_window_size": 5,
-        "detection_threshold": 0.0,
-        "pad_if_not_divisible": True,
-    }
-
-    preprocess_conf = {
-        "resize": 1024,
-        "grayscale": False,
-    }
-
-    required_data_keys = ["image"]
-
-    def __init__(self, **conf) -> None:
-        super().__init__(**conf)  # Update with default configuration.
-        self.model = kornia.feature.DISK.from_pretrained(self.conf.weights)
-
-    def forward(self, data: dict) -> dict:
-        """Compute keypoints, scores, descriptors for image"""
-        for key in self.required_data_keys:
-            assert key in data, f"Missing key {key} in data"
-        image = data["image"]
-        if image.shape[1] == 1:
-            image = kornia.color.grayscale_to_rgb(image)
-        features = self.model(
-            image,
-            n=self.conf.max_num_keypoints,
-            window_size=self.conf.nms_window_size,
-            score_threshold=self.conf.detection_threshold,
-            pad_if_not_divisible=self.conf.pad_if_not_divisible,
-        )
-        keypoints = [f.keypoints for f in features]
-        scores = [f.detection_scores for f in features]
-        descriptors = [f.descriptors for f in features]
-        del features
-
-        keypoints = torch.stack(keypoints, 0)
-        scores = torch.stack(scores, 0)
-        descriptors = torch.stack(descriptors, 0)
-
-        return {
-            "keypoints": keypoints.to(image).contiguous(),
-            "keypoint_scores": scores.to(image).contiguous(),
-            "descriptors": descriptors.to(image).contiguous(),
-        }

+ 0 - 41
python/LightGlue/lightglue/dog_hardnet.py

@@ -1,41 +0,0 @@
-import torch
-from kornia.color import rgb_to_grayscale
-from kornia.feature import HardNet, LAFDescriptor, laf_from_center_scale_ori
-
-from .sift import SIFT
-
-
-class DoGHardNet(SIFT):
-    required_data_keys = ["image"]
-
-    def __init__(self, **conf):
-        super().__init__(**conf)
-        self.laf_desc = LAFDescriptor(HardNet(True)).eval()
-
-    def forward(self, data: dict) -> dict:
-        image = data["image"]
-        if image.shape[1] == 3:
-            image = rgb_to_grayscale(image)
-        device = image.device
-        self.laf_desc = self.laf_desc.to(device)
-        self.laf_desc.descriptor = self.laf_desc.descriptor.eval()
-        pred = []
-        if "image_size" in data.keys():
-            im_size = data.get("image_size").long()
-        else:
-            im_size = None
-        for k in range(len(image)):
-            img = image[k]
-            if im_size is not None:
-                w, h = data["image_size"][k]
-                img = img[:, : h.to(torch.int32), : w.to(torch.int32)]
-            p = self.extract_single_image(img)
-            lafs = laf_from_center_scale_ori(
-                p["keypoints"].reshape(1, -1, 2),
-                6.0 * p["scales"].reshape(1, -1, 1, 1),
-                torch.rad2deg(p["oris"]).reshape(1, -1, 1),
-            ).to(device)
-            p["descriptors"] = self.laf_desc(img[None], lafs).reshape(-1, 128)
-            pred.append(p)
-        pred = {k: torch.stack([p[k] for p in pred], 0).to(device) for k in pred[0]}
-        return pred

+ 0 - 667
python/LightGlue/lightglue/lightglue.py

@@ -1,667 +0,0 @@
-import warnings
-from pathlib import Path
-from types import SimpleNamespace
-from typing import Callable, List, Optional, Tuple
-
-import numpy as np
-import torch
-import torch.nn.functional as F
-from torch import nn
-
-try:
-    from flash_attn.modules.mha import FlashCrossAttention
-except ModuleNotFoundError:
-    FlashCrossAttention = None
-
-if FlashCrossAttention or hasattr(F, "scaled_dot_product_attention"):
-    FLASH_AVAILABLE = True
-else:
-    FLASH_AVAILABLE = False
-
-torch.backends.cudnn.deterministic = True
-
-
-AMP_CUSTOM_FWD_F32 = (
-    torch.amp.custom_fwd(cast_inputs=torch.float32, device_type="cuda")
-    if hasattr(torch, "amp") and hasattr(torch.amp, "custom_fwd")
-    else torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
-)
-
-
-@AMP_CUSTOM_FWD_F32
-def normalize_keypoints(
-    kpts: torch.Tensor, size: Optional[torch.Tensor] = None
-) -> torch.Tensor:
-    if size is None:
-        size = 1 + kpts.max(-2).values - kpts.min(-2).values
-    elif not isinstance(size, torch.Tensor):
-        size = torch.tensor(size, device=kpts.device, dtype=kpts.dtype)
-    size = size.to(kpts)
-    shift = size / 2
-    scale = size.max(-1).values / 2
-    kpts = (kpts - shift[..., None, :]) / scale[..., None, None]
-    return kpts
-
-
-def pad_to_length(x: torch.Tensor, length: int) -> Tuple[torch.Tensor]:
-    if length <= x.shape[-2]:
-        return x, torch.ones_like(x[..., :1], dtype=torch.bool)
-    pad = torch.ones(
-        *x.shape[:-2], length - x.shape[-2], x.shape[-1], device=x.device, dtype=x.dtype
-    )
-    y = torch.cat([x, pad], dim=-2)
-    mask = torch.zeros(*y.shape[:-1], 1, dtype=torch.bool, device=x.device)
-    mask[..., : x.shape[-2], :] = True
-    return y, mask
-
-
-def rotate_half(x: torch.Tensor) -> torch.Tensor:
-    x = x.unflatten(-1, (-1, 2))
-    x1, x2 = x.unbind(dim=-1)
-    return torch.stack((-x2, x1), dim=-1).flatten(start_dim=-2)
-
-
-def apply_cached_rotary_emb(freqs: torch.Tensor, t: torch.Tensor) -> torch.Tensor:
-    return (t * freqs[0]) + (rotate_half(t) * freqs[1])
-
-
-class LearnableFourierPositionalEncoding(nn.Module):
-    def __init__(self, M: int, dim: int, F_dim: int = None, gamma: float = 1.0) -> None:
-        super().__init__()
-        F_dim = F_dim if F_dim is not None else dim
-        self.gamma = gamma
-        self.Wr = nn.Linear(M, F_dim // 2, bias=False)
-        nn.init.normal_(self.Wr.weight.data, mean=0, std=self.gamma**-2)
-
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        """encode position vector"""
-        projected = self.Wr(x)
-        cosines, sines = torch.cos(projected), torch.sin(projected)
-        emb = torch.stack([cosines, sines], 0).unsqueeze(-3)
-        return emb.repeat_interleave(2, dim=-1)
-
-
-class TokenConfidence(nn.Module):
-    def __init__(self, dim: int) -> None:
-        super().__init__()
-        self.token = nn.Sequential(nn.Linear(dim, 1), nn.Sigmoid())
-
-    def forward(self, desc0: torch.Tensor, desc1: torch.Tensor):
-        """get confidence tokens"""
-        return (
-            self.token(desc0.detach()).squeeze(-1),
-            self.token(desc1.detach()).squeeze(-1),
-        )
-
-
-class Attention(nn.Module):
-    def __init__(self, allow_flash: bool) -> None:
-        super().__init__()
-        if allow_flash and not FLASH_AVAILABLE:
-            warnings.warn(
-                "FlashAttention is not available. For optimal speed, "
-                "consider installing torch >= 2.0 or flash-attn.",
-                stacklevel=2,
-            )
-        self.enable_flash = allow_flash and FLASH_AVAILABLE
-        self.has_sdp = hasattr(F, "scaled_dot_product_attention")
-        if allow_flash and FlashCrossAttention:
-            self.flash_ = FlashCrossAttention()
-        if self.has_sdp:
-            torch.backends.cuda.enable_flash_sdp(allow_flash)
-
-    def forward(self, q, k, v, mask: Optional[torch.Tensor] = None) -> torch.Tensor:
-        if q.shape[-2] == 0 or k.shape[-2] == 0:
-            return q.new_zeros((*q.shape[:-1], v.shape[-1]))
-        if self.enable_flash and q.device.type == "cuda":
-            # use torch 2.0 scaled_dot_product_attention with flash
-            if self.has_sdp:
-                args = [x.half().contiguous() for x in [q, k, v]]
-                v = F.scaled_dot_product_attention(*args, attn_mask=mask).to(q.dtype)
-                return v if mask is None else v.nan_to_num()
-            else:
-                assert mask is None
-                q, k, v = [x.transpose(-2, -3).contiguous() for x in [q, k, v]]
-                m = self.flash_(q.half(), torch.stack([k, v], 2).half())
-                return m.transpose(-2, -3).to(q.dtype).clone()
-        elif self.has_sdp:
-            args = [x.contiguous() for x in [q, k, v]]
-            v = F.scaled_dot_product_attention(*args, attn_mask=mask)
-            return v if mask is None else v.nan_to_num()
-        else:
-            s = q.shape[-1] ** -0.5
-            sim = torch.einsum("...id,...jd->...ij", q, k) * s
-            if mask is not None:
-                sim.masked_fill(~mask, -float("inf"))
-            attn = F.softmax(sim, -1)
-            return torch.einsum("...ij,...jd->...id", attn, v)
-
-
-class SelfBlock(nn.Module):
-    def __init__(
-        self, embed_dim: int, num_heads: int, flash: bool = False, bias: bool = True
-    ) -> None:
-        super().__init__()
-        self.embed_dim = embed_dim
-        self.num_heads = num_heads
-        assert self.embed_dim % num_heads == 0
-        self.head_dim = self.embed_dim // num_heads
-        self.Wqkv = nn.Linear(embed_dim, 3 * embed_dim, bias=bias)
-        self.inner_attn = Attention(flash)
-        self.out_proj = nn.Linear(embed_dim, embed_dim, bias=bias)
-        self.ffn = nn.Sequential(
-            nn.Linear(2 * embed_dim, 2 * embed_dim),
-            nn.LayerNorm(2 * embed_dim, elementwise_affine=True),
-            nn.GELU(),
-            nn.Linear(2 * embed_dim, embed_dim),
-        )
-
-    def forward(
-        self,
-        x: torch.Tensor,
-        encoding: torch.Tensor,
-        mask: Optional[torch.Tensor] = None,
-    ) -> torch.Tensor:
-        qkv = self.Wqkv(x)
-        qkv = qkv.unflatten(-1, (self.num_heads, -1, 3)).transpose(1, 2)
-        q, k, v = qkv[..., 0], qkv[..., 1], qkv[..., 2]
-        q = apply_cached_rotary_emb(encoding, q)
-        k = apply_cached_rotary_emb(encoding, k)
-        context = self.inner_attn(q, k, v, mask=mask)
-        message = self.out_proj(context.transpose(1, 2).flatten(start_dim=-2))
-        return x + self.ffn(torch.cat([x, message], -1))
-
-
-class CrossBlock(nn.Module):
-    def __init__(
-        self, embed_dim: int, num_heads: int, flash: bool = False, bias: bool = True
-    ) -> None:
-        super().__init__()
-        self.heads = num_heads
-        dim_head = embed_dim // num_heads
-        self.scale = dim_head**-0.5
-        inner_dim = dim_head * num_heads
-        self.to_qk = nn.Linear(embed_dim, inner_dim, bias=bias)
-        self.to_v = nn.Linear(embed_dim, inner_dim, bias=bias)
-        self.to_out = nn.Linear(inner_dim, embed_dim, bias=bias)
-        self.ffn = nn.Sequential(
-            nn.Linear(2 * embed_dim, 2 * embed_dim),
-            nn.LayerNorm(2 * embed_dim, elementwise_affine=True),
-            nn.GELU(),
-            nn.Linear(2 * embed_dim, embed_dim),
-        )
-        if flash and FLASH_AVAILABLE:
-            self.flash = Attention(True)
-        else:
-            self.flash = None
-
-    def map_(self, func: Callable, x0: torch.Tensor, x1: torch.Tensor):
-        return func(x0), func(x1)
-
-    def forward(
-        self, x0: torch.Tensor, x1: torch.Tensor, mask: Optional[torch.Tensor] = None
-    ) -> List[torch.Tensor]:
-        qk0, qk1 = self.map_(self.to_qk, x0, x1)
-        v0, v1 = self.map_(self.to_v, x0, x1)
-        qk0, qk1, v0, v1 = map(
-            lambda t: t.unflatten(-1, (self.heads, -1)).transpose(1, 2),
-            (qk0, qk1, v0, v1),
-        )
-        if self.flash is not None and qk0.device.type == "cuda":
-            m0 = self.flash(qk0, qk1, v1, mask)
-            m1 = self.flash(
-                qk1, qk0, v0, mask.transpose(-1, -2) if mask is not None else None
-            )
-        else:
-            qk0, qk1 = qk0 * self.scale**0.5, qk1 * self.scale**0.5
-            sim = torch.einsum("bhid, bhjd -> bhij", qk0, qk1)
-            if mask is not None:
-                sim = sim.masked_fill(~mask, -float("inf"))
-            attn01 = F.softmax(sim, dim=-1)
-            attn10 = F.softmax(sim.transpose(-2, -1).contiguous(), dim=-1)
-            m0 = torch.einsum("bhij, bhjd -> bhid", attn01, v1)
-            m1 = torch.einsum("bhji, bhjd -> bhid", attn10.transpose(-2, -1), v0)
-            if mask is not None:
-                m0, m1 = m0.nan_to_num(), m1.nan_to_num()
-        m0, m1 = self.map_(lambda t: t.transpose(1, 2).flatten(start_dim=-2), m0, m1)
-        m0, m1 = self.map_(self.to_out, m0, m1)
-        x0 = x0 + self.ffn(torch.cat([x0, m0], -1))
-        x1 = x1 + self.ffn(torch.cat([x1, m1], -1))
-        return x0, x1
-
-
-class TransformerLayer(nn.Module):
-    def __init__(self, *args, **kwargs):
-        super().__init__()
-        self.self_attn = SelfBlock(*args, **kwargs)
-        self.cross_attn = CrossBlock(*args, **kwargs)
-
-    def forward(
-        self,
-        desc0,
-        desc1,
-        encoding0,
-        encoding1,
-        mask0: Optional[torch.Tensor] = None,
-        mask1: Optional[torch.Tensor] = None,
-    ):
-        if mask0 is not None and mask1 is not None:
-            return self.masked_forward(desc0, desc1, encoding0, encoding1, mask0, mask1)
-        else:
-            desc0 = self.self_attn(desc0, encoding0)
-            desc1 = self.self_attn(desc1, encoding1)
-            return self.cross_attn(desc0, desc1)
-
-    # This part is compiled and allows padding inputs
-    def masked_forward(self, desc0, desc1, encoding0, encoding1, mask0, mask1):
-        mask = mask0 & mask1.transpose(-1, -2)
-        mask0 = mask0 & mask0.transpose(-1, -2)
-        mask1 = mask1 & mask1.transpose(-1, -2)
-        desc0 = self.self_attn(desc0, encoding0, mask0)
-        desc1 = self.self_attn(desc1, encoding1, mask1)
-        return self.cross_attn(desc0, desc1, mask)
-
-
-def sigmoid_log_double_softmax(
-    sim: torch.Tensor, z0: torch.Tensor, z1: torch.Tensor
-) -> torch.Tensor:
-    """create the log assignment matrix from logits and similarity"""
-    b, m, n = sim.shape
-    certainties = F.logsigmoid(z0) + F.logsigmoid(z1).transpose(1, 2)
-    scores0 = F.log_softmax(sim, 2)
-    scores1 = F.log_softmax(sim.transpose(-1, -2).contiguous(), 2).transpose(-1, -2)
-    scores = sim.new_full((b, m + 1, n + 1), 0)
-    scores[:, :m, :n] = scores0 + scores1 + certainties
-    scores[:, :-1, -1] = F.logsigmoid(-z0.squeeze(-1))
-    scores[:, -1, :-1] = F.logsigmoid(-z1.squeeze(-1))
-    return scores
-
-
-class MatchAssignment(nn.Module):
-    def __init__(self, dim: int) -> None:
-        super().__init__()
-        self.dim = dim
-        self.matchability = nn.Linear(dim, 1, bias=True)
-        self.final_proj = nn.Linear(dim, dim, bias=True)
-
-    def forward(self, desc0: torch.Tensor, desc1: torch.Tensor):
-        """build assignment matrix from descriptors"""
-        mdesc0, mdesc1 = self.final_proj(desc0), self.final_proj(desc1)
-        _, _, d = mdesc0.shape
-        mdesc0, mdesc1 = mdesc0 / d**0.25, mdesc1 / d**0.25
-        sim = torch.einsum("bmd,bnd->bmn", mdesc0, mdesc1)
-        z0 = self.matchability(desc0)
-        z1 = self.matchability(desc1)
-        scores = sigmoid_log_double_softmax(sim, z0, z1)
-        return scores, sim
-
-    def get_matchability(self, desc: torch.Tensor):
-        return torch.sigmoid(self.matchability(desc)).squeeze(-1)
-
-
-def filter_matches(scores: torch.Tensor, th: float):
-    """obtain matches from a log assignment matrix [Bx M+1 x N+1]"""
-    max0, max1 = scores[:, :-1, :-1].max(2), scores[:, :-1, :-1].max(1)
-    m0, m1 = max0.indices, max1.indices
-    indices0 = torch.arange(m0.shape[1], device=m0.device)[None]
-    indices1 = torch.arange(m1.shape[1], device=m1.device)[None]
-    mutual0 = indices0 == m1.gather(1, m0)
-    mutual1 = indices1 == m0.gather(1, m1)
-    max0_exp = max0.values.exp()
-    zero = max0_exp.new_tensor(0)
-    mscores0 = torch.where(mutual0, max0_exp, zero)
-    mscores1 = torch.where(mutual1, mscores0.gather(1, m1), zero)
-    valid0 = mutual0 & (mscores0 > th)
-    valid1 = mutual1 & valid0.gather(1, m1)
-    m0 = torch.where(valid0, m0, -1)
-    m1 = torch.where(valid1, m1, -1)
-    return m0, m1, mscores0, mscores1
-
-
-class LightGlue(nn.Module):
-    default_conf = {
-        "name": "lightglue",  # just for interfacing
-        "input_dim": 256,  # input descriptor dimension (autoselected from weights)
-        "descriptor_dim": 256,
-        "add_scale_ori": False,
-        "n_layers": 9,
-        "num_heads": 4,
-        "flash": True,  # enable FlashAttention if available.
-        "mp": False,  # enable mixed precision
-        "depth_confidence": 0.95,  # early stopping, disable with -1
-        "width_confidence": 0.99,  # point pruning, disable with -1
-        "filter_threshold": 0.1,  # match threshold
-        "weights": None,
-    }
-
-    # Point pruning involves an overhead (gather).
-    # Therefore, we only activate it if there are enough keypoints.
-    pruning_keypoint_thresholds = {
-        "cpu": -1,
-        "mps": -1,
-        "cuda": 1024,
-        "flash": 1536,
-    }
-
-    required_data_keys = ["image0", "image1"]
-
-    version = "v0.1_arxiv"
-    url = "https://github.com/cvg/LightGlue/releases/download/{}/{}.pth"
-
-    features = {
-        "superpoint": {
-            "weights": "superpoint_lightglue",
-            "input_dim": 256,
-        },
-        "disk": {
-            "weights": "disk_lightglue",
-            "input_dim": 128,
-        },
-        "aliked": {
-            "weights": "aliked_lightglue",
-            "input_dim": 128,
-        },
-        "raco-aliked": {
-            "weights": "raco_aliked_lightglue",
-            "input_dim": 128,
-        },
-        "sift": {
-            "weights": "sift_lightglue",
-            "input_dim": 128,
-            "add_scale_ori": True,
-        },
-        "doghardnet": {
-            "weights": "doghardnet_lightglue",
-            "input_dim": 128,
-            "add_scale_ori": True,
-        },
-    }
-
-    def __init__(self, features="superpoint", **conf) -> None:
-        super().__init__()
-        self.conf = conf = SimpleNamespace(**{**self.default_conf, **conf})
-        if features is not None:
-            if features not in self.features:
-                raise ValueError(
-                    f"Unsupported features: {features} not in "
-                    f"{{{','.join(self.features)}}}"
-                )
-            for k, v in self.features[features].items():
-                setattr(conf, k, v)
-
-        if conf.input_dim != conf.descriptor_dim:
-            self.input_proj = nn.Linear(conf.input_dim, conf.descriptor_dim, bias=True)
-        else:
-            self.input_proj = nn.Identity()
-
-        head_dim = conf.descriptor_dim // conf.num_heads
-        self.posenc = LearnableFourierPositionalEncoding(
-            2 + 2 * self.conf.add_scale_ori, head_dim, head_dim
-        )
-
-        h, n, d = conf.num_heads, conf.n_layers, conf.descriptor_dim
-
-        self.transformers = nn.ModuleList(
-            [TransformerLayer(d, h, conf.flash) for _ in range(n)]
-        )
-
-        self.log_assignment = nn.ModuleList([MatchAssignment(d) for _ in range(n)])
-        self.token_confidence = nn.ModuleList(
-            [TokenConfidence(d) for _ in range(n - 1)]
-        )
-        self.register_buffer(
-            "confidence_thresholds",
-            torch.Tensor(
-                [self.confidence_threshold(i) for i in range(self.conf.n_layers)]
-            ),
-        )
-
-        state_dict = None
-        if features is not None:
-            fname = f"{conf.weights}_{self.version.replace('.', '-')}.pth"
-            state_dict = torch.hub.load_state_dict_from_url(
-                self.url.format(self.version, self.conf.weights),
-                file_name=fname,
-            )
-            self.load_state_dict(state_dict, strict=False)
-        elif conf.weights is not None:
-            path = Path(__file__).parent
-            path = path / "weights/{}.pth".format(self.conf.weights)
-            state_dict = torch.load(str(path), map_location="cpu")
-
-        if state_dict:
-            # rename old state dict entries
-            for i in range(self.conf.n_layers):
-                pattern = f"self_attn.{i}", f"transformers.{i}.self_attn"
-                state_dict = {k.replace(*pattern): v for k, v in state_dict.items()}
-                pattern = f"cross_attn.{i}", f"transformers.{i}.cross_attn"
-                state_dict = {k.replace(*pattern): v for k, v in state_dict.items()}
-            self.load_state_dict(state_dict, strict=False)
-
-        # static lengths LightGlue is compiled for (only used with torch.compile)
-        self.static_lengths = None
-
-    def compile(
-        self, mode="reduce-overhead", static_lengths=[256, 512, 768, 1024, 1280, 1536]
-    ):
-        if self.conf.width_confidence != -1:
-            warnings.warn(
-                "Point pruning is partially disabled for compiled forward.",
-                stacklevel=2,
-            )
-
-        torch._inductor.cudagraph_mark_step_begin()
-        for i in range(self.conf.n_layers):
-            self.transformers[i].masked_forward = torch.compile(
-                self.transformers[i].masked_forward, mode=mode, fullgraph=True
-            )
-
-        self.static_lengths = static_lengths
-
-    def forward(self, data: dict) -> dict:
-        """
-        Match keypoints and descriptors between two images
-
-        Input (dict):
-            image0: dict
-                keypoints: [B x M x 2]
-                descriptors: [B x M x D]
-                image: [B x C x H x W] or image_size: [B x 2]
-            image1: dict
-                keypoints: [B x N x 2]
-                descriptors: [B x N x D]
-                image: [B x C x H x W] or image_size: [B x 2]
-        Output (dict):
-            matches0: [B x M]
-            matching_scores0: [B x M]
-            matches1: [B x N]
-            matching_scores1: [B x N]
-            matches: List[[Si x 2]]
-            scores: List[[Si]]
-            stop: int
-            prune0: [B x M]
-            prune1: [B x N]
-        """
-        with torch.autocast(enabled=self.conf.mp, device_type="cuda"):
-            return self._forward(data)
-
-    def _forward(self, data: dict) -> dict:
-        for key in self.required_data_keys:
-            assert key in data, f"Missing key {key} in data"
-        data0, data1 = data["image0"], data["image1"]
-        kpts0, kpts1 = data0["keypoints"], data1["keypoints"]
-        b, m, _ = kpts0.shape
-        b, n, _ = kpts1.shape
-        device = kpts0.device
-        size0, size1 = data0.get("image_size"), data1.get("image_size")
-        kpts0 = normalize_keypoints(kpts0, size0).clone()
-        kpts1 = normalize_keypoints(kpts1, size1).clone()
-
-        if self.conf.add_scale_ori:
-            kpts0 = torch.cat(
-                [kpts0] + [data0[k].unsqueeze(-1) for k in ("scales", "oris")], -1
-            )
-            kpts1 = torch.cat(
-                [kpts1] + [data1[k].unsqueeze(-1) for k in ("scales", "oris")], -1
-            )
-        desc0 = data0["descriptors"].detach().contiguous()
-        desc1 = data1["descriptors"].detach().contiguous()
-
-        assert desc0.shape[-1] == self.conf.input_dim
-        assert desc1.shape[-1] == self.conf.input_dim
-
-        if torch.is_autocast_enabled():
-            desc0 = desc0.half()
-            desc1 = desc1.half()
-
-        mask0, mask1 = None, None
-        c = max(m, n)
-        do_compile = self.static_lengths and c <= max(self.static_lengths)
-        if do_compile:
-            kn = min([k for k in self.static_lengths if k >= c])
-            desc0, mask0 = pad_to_length(desc0, kn)
-            desc1, mask1 = pad_to_length(desc1, kn)
-            kpts0, _ = pad_to_length(kpts0, kn)
-            kpts1, _ = pad_to_length(kpts1, kn)
-        desc0 = self.input_proj(desc0)
-        desc1 = self.input_proj(desc1)
-        # cache positional embeddings
-        encoding0 = self.posenc(kpts0)
-        encoding1 = self.posenc(kpts1)
-
-        # GNN + final_proj + assignment
-        do_early_stop = self.conf.depth_confidence > 0
-        do_point_pruning = self.conf.width_confidence > 0 and not do_compile
-        pruning_th = self.pruning_min_kpts(device)
-        if do_point_pruning:
-            ind0 = torch.arange(0, m, device=device)[None]
-            ind1 = torch.arange(0, n, device=device)[None]
-            # We store the index of the layer at which pruning is detected.
-            prune0 = torch.ones_like(ind0)
-            prune1 = torch.ones_like(ind1)
-        token0, token1 = None, None
-        for i in range(self.conf.n_layers):
-            if desc0.shape[1] == 0 or desc1.shape[1] == 0:  # no keypoints
-                break
-            desc0, desc1 = self.transformers[i](
-                desc0, desc1, encoding0, encoding1, mask0=mask0, mask1=mask1
-            )
-            if i == self.conf.n_layers - 1:
-                continue  # no early stopping or adaptive width at last layer
-
-            if do_early_stop:
-                token0, token1 = self.token_confidence[i](desc0, desc1)
-                if self.check_if_stop(token0[..., :m], token1[..., :n], i, m + n):
-                    break
-            if do_point_pruning and desc0.shape[-2] > pruning_th:
-                scores0 = self.log_assignment[i].get_matchability(desc0)
-                prunemask0 = self.get_pruning_mask(token0, scores0, i)
-                keep0 = torch.where(prunemask0)[1]
-                ind0 = ind0.index_select(1, keep0)
-                desc0 = desc0.index_select(1, keep0)
-                encoding0 = encoding0.index_select(-2, keep0)
-                prune0[:, ind0] += 1
-            if do_point_pruning and desc1.shape[-2] > pruning_th:
-                scores1 = self.log_assignment[i].get_matchability(desc1)
-                prunemask1 = self.get_pruning_mask(token1, scores1, i)
-                keep1 = torch.where(prunemask1)[1]
-                ind1 = ind1.index_select(1, keep1)
-                desc1 = desc1.index_select(1, keep1)
-                encoding1 = encoding1.index_select(-2, keep1)
-                prune1[:, ind1] += 1
-
-        if desc0.shape[1] == 0 or desc1.shape[1] == 0:  # no keypoints
-            m0 = desc0.new_full((b, m), -1, dtype=torch.long)
-            m1 = desc1.new_full((b, n), -1, dtype=torch.long)
-            mscores0 = desc0.new_zeros((b, m))
-            mscores1 = desc1.new_zeros((b, n))
-            matches = desc0.new_empty((b, 0, 2), dtype=torch.long)
-            mscores = desc0.new_empty((b, 0))
-            if not do_point_pruning:
-                prune0 = torch.ones_like(mscores0) * self.conf.n_layers
-                prune1 = torch.ones_like(mscores1) * self.conf.n_layers
-            return {
-                "matches0": m0,
-                "matches1": m1,
-                "matching_scores0": mscores0,
-                "matching_scores1": mscores1,
-                "stop": i + 1,
-                "matches": matches,
-                "scores": mscores,
-                "prune0": prune0,
-                "prune1": prune1,
-            }
-
-        desc0, desc1 = desc0[..., :m, :], desc1[..., :n, :]  # remove padding
-        scores, _ = self.log_assignment[i](desc0, desc1)
-        m0, m1, mscores0, mscores1 = filter_matches(scores, self.conf.filter_threshold)
-        matches, mscores = [], []
-        for k in range(b):
-            valid = m0[k] > -1
-            m_indices_0 = torch.where(valid)[0]
-            m_indices_1 = m0[k][valid]
-            if do_point_pruning:
-                m_indices_0 = ind0[k, m_indices_0]
-                m_indices_1 = ind1[k, m_indices_1]
-            matches.append(torch.stack([m_indices_0, m_indices_1], -1))
-            mscores.append(mscores0[k][valid])
-
-        # TODO: Remove when hloc switches to the compact format.
-        if do_point_pruning:
-            m0_ = torch.full((b, m), -1, device=m0.device, dtype=m0.dtype)
-            m1_ = torch.full((b, n), -1, device=m1.device, dtype=m1.dtype)
-            m0_[:, ind0] = torch.where(m0 == -1, -1, ind1.gather(1, m0.clamp(min=0)))
-            m1_[:, ind1] = torch.where(m1 == -1, -1, ind0.gather(1, m1.clamp(min=0)))
-            mscores0_ = torch.zeros((b, m), device=mscores0.device)
-            mscores1_ = torch.zeros((b, n), device=mscores1.device)
-            mscores0_[:, ind0] = mscores0
-            mscores1_[:, ind1] = mscores1
-            m0, m1, mscores0, mscores1 = m0_, m1_, mscores0_, mscores1_
-        else:
-            prune0 = torch.ones_like(mscores0) * self.conf.n_layers
-            prune1 = torch.ones_like(mscores1) * self.conf.n_layers
-
-        return {
-            "matches0": m0,
-            "matches1": m1,
-            "matching_scores0": mscores0,
-            "matching_scores1": mscores1,
-            "stop": i + 1,
-            "matches": matches,
-            "scores": mscores,
-            "prune0": prune0,
-            "prune1": prune1,
-        }
-
-    def confidence_threshold(self, layer_index: int) -> float:
-        """scaled confidence threshold"""
-        threshold = 0.8 + 0.1 * np.exp(-4.0 * layer_index / self.conf.n_layers)
-        return np.clip(threshold, 0, 1)
-
-    def get_pruning_mask(
-        self, confidences: torch.Tensor, scores: torch.Tensor, layer_index: int
-    ) -> torch.Tensor:
-        """mask points which should be removed"""
-        keep = scores > (1 - self.conf.width_confidence)
-        if confidences is not None:  # Low-confidence points are never pruned.
-            keep |= confidences <= self.confidence_thresholds[layer_index]
-        return keep
-
-    def check_if_stop(
-        self,
-        confidences0: torch.Tensor,
-        confidences1: torch.Tensor,
-        layer_index: int,
-        num_points: int,
-    ) -> torch.Tensor:
-        """evaluate stopping condition"""
-        confidences = torch.cat([confidences0, confidences1], -1)
-        threshold = self.confidence_thresholds[layer_index]
-        ratio_confident = 1.0 - (confidences < threshold).float().sum() / num_points
-        return ratio_confident > self.conf.depth_confidence
-
-    def pruning_min_kpts(self, device: torch.device):
-        if self.conf.flash and FLASH_AVAILABLE and device.type == "cuda":
-            return self.pruning_keypoint_thresholds["flash"]
-        else:
-            return self.pruning_keypoint_thresholds[device.type]

+ 0 - 216
python/LightGlue/lightglue/sift.py

@@ -1,216 +0,0 @@
-import warnings
-
-import cv2
-import numpy as np
-import torch
-from kornia.color import rgb_to_grayscale
-from packaging import version
-
-try:
-    import pycolmap
-except ImportError:
-    pycolmap = None
-
-from .utils import Extractor
-
-
-def filter_dog_point(points, scales, angles, image_shape, nms_radius, scores=None):
-    h, w = image_shape
-    ij = np.round(points - 0.5).astype(int).T[::-1]
-
-    # Remove duplicate points (identical coordinates).
-    # Pick highest scale or score
-    s = scales if scores is None else scores
-    buffer = np.zeros((h, w))
-    np.maximum.at(buffer, tuple(ij), s)
-    keep = np.where(buffer[tuple(ij)] == s)[0]
-
-    # Pick lowest angle (arbitrary).
-    ij = ij[:, keep]
-    buffer[:] = np.inf
-    o_abs = np.abs(angles[keep])
-    np.minimum.at(buffer, tuple(ij), o_abs)
-    mask = buffer[tuple(ij)] == o_abs
-    ij = ij[:, mask]
-    keep = keep[mask]
-
-    if nms_radius > 0:
-        # Apply NMS on the remaining points
-        buffer[:] = 0
-        buffer[tuple(ij)] = s[keep]  # scores or scale
-
-        local_max = torch.nn.functional.max_pool2d(
-            torch.from_numpy(buffer).unsqueeze(0),
-            kernel_size=nms_radius * 2 + 1,
-            stride=1,
-            padding=nms_radius,
-        ).squeeze(0)
-        is_local_max = buffer == local_max.numpy()
-        keep = keep[is_local_max[tuple(ij)]]
-    return keep
-
-
-def sift_to_rootsift(x: torch.Tensor, eps=1e-6) -> torch.Tensor:
-    x = torch.nn.functional.normalize(x, p=1, dim=-1, eps=eps)
-    x.clip_(min=eps).sqrt_()
-    return torch.nn.functional.normalize(x, p=2, dim=-1, eps=eps)
-
-
-def run_opencv_sift(features: cv2.Feature2D, image: np.ndarray) -> np.ndarray:
-    """
-    Detect keypoints using OpenCV Detector.
-    Optionally, perform description.
-    Args:
-        features: OpenCV based keypoints detector and descriptor
-        image: Grayscale image of uint8 data type
-    Returns:
-        keypoints: 1D array of detected cv2.KeyPoint
-        scores: 1D array of responses
-        descriptors: 1D array of descriptors
-    """
-    detections, descriptors = features.detectAndCompute(image, None)
-    points = np.array([k.pt for k in detections], dtype=np.float32)
-    scores = np.array([k.response for k in detections], dtype=np.float32)
-    scales = np.array([k.size for k in detections], dtype=np.float32)
-    angles = np.deg2rad(np.array([k.angle for k in detections], dtype=np.float32))
-    return points, scores, scales, angles, descriptors
-
-
-class SIFT(Extractor):
-    default_conf = {
-        "rootsift": True,
-        "nms_radius": 0,  # None to disable filtering entirely.
-        "max_num_keypoints": 4096,
-        "backend": "opencv",  # in {opencv, pycolmap, pycolmap_cpu, pycolmap_cuda}
-        "detection_threshold": 0.0066667,  # from COLMAP
-        "edge_threshold": 10,
-        "first_octave": -1,  # only used by pycolmap, the default of COLMAP
-        "num_octaves": 4,
-    }
-
-    preprocess_conf = {
-        "resize": 1024,
-    }
-
-    required_data_keys = ["image"]
-
-    def __init__(self, **conf):
-        super().__init__(**conf)  # Update with default configuration.
-        backend = self.conf.backend
-        if backend.startswith("pycolmap"):
-            if pycolmap is None:
-                raise ImportError(
-                    "Cannot find module pycolmap: install it with pip"
-                    "or use backend=opencv."
-                )
-            options = {
-                "peak_threshold": self.conf.detection_threshold,
-                "edge_threshold": self.conf.edge_threshold,
-                "first_octave": self.conf.first_octave,
-                "num_octaves": self.conf.num_octaves,
-                "normalization": pycolmap.Normalization.L2,  # L1_ROOT is buggy.
-            }
-            device = (
-                "auto" if backend == "pycolmap" else backend.replace("pycolmap_", "")
-            )
-            if (
-                backend == "pycolmap_cpu" or not pycolmap.has_cuda
-            ) and pycolmap.__version__ < "0.5.0":
-                warnings.warn(
-                    "The pycolmap CPU SIFT is buggy in version < 0.5.0, "
-                    "consider upgrading pycolmap or use the CUDA version.",
-                    stacklevel=1,
-                )
-            else:
-                options["max_num_features"] = self.conf.max_num_keypoints
-            self.sift = pycolmap.Sift(options=options, device=device)
-        elif backend == "opencv":
-            self.sift = cv2.SIFT_create(
-                contrastThreshold=self.conf.detection_threshold,
-                nfeatures=self.conf.max_num_keypoints,
-                edgeThreshold=self.conf.edge_threshold,
-                nOctaveLayers=self.conf.num_octaves,
-            )
-        else:
-            backends = {"opencv", "pycolmap", "pycolmap_cpu", "pycolmap_cuda"}
-            raise ValueError(
-                f"Unknown backend: {backend} not in " f"{{{','.join(backends)}}}."
-            )
-
-    def extract_single_image(self, image: torch.Tensor):
-        image_np = image.cpu().numpy().squeeze(0)
-
-        if self.conf.backend.startswith("pycolmap"):
-            if version.parse(pycolmap.__version__) >= version.parse("0.5.0"):
-                detections, descriptors = self.sift.extract(image_np)
-                scores = None  # Scores are not exposed by COLMAP anymore.
-            else:
-                detections, scores, descriptors = self.sift.extract(image_np)
-            keypoints = detections[:, :2]  # Keep only (x, y).
-            scales, angles = detections[:, -2:].T
-            if scores is not None and (
-                self.conf.backend == "pycolmap_cpu" or not pycolmap.has_cuda
-            ):
-                # Set the scores as a combination of abs. response and scale.
-                scores = np.abs(scores) * scales
-        elif self.conf.backend == "opencv":
-            # TODO: Check if opencv keypoints are already in corner convention
-            keypoints, scores, scales, angles, descriptors = run_opencv_sift(
-                self.sift, (image_np * 255.0).astype(np.uint8)
-            )
-        pred = {
-            "keypoints": keypoints,
-            "scales": scales,
-            "oris": angles,
-            "descriptors": descriptors,
-        }
-        if scores is not None:
-            pred["keypoint_scores"] = scores
-
-        # sometimes pycolmap returns points outside the image. We remove them
-        if self.conf.backend.startswith("pycolmap"):
-            is_inside = (
-                pred["keypoints"] + 0.5 < np.array([image_np.shape[-2:][::-1]])
-            ).all(-1)
-            pred = {k: v[is_inside] for k, v in pred.items()}
-
-        if self.conf.nms_radius is not None:
-            keep = filter_dog_point(
-                pred["keypoints"],
-                pred["scales"],
-                pred["oris"],
-                image_np.shape,
-                self.conf.nms_radius,
-                scores=pred.get("keypoint_scores"),
-            )
-            pred = {k: v[keep] for k, v in pred.items()}
-
-        pred = {k: torch.from_numpy(v) for k, v in pred.items()}
-        if scores is not None:
-            # Keep the k keypoints with highest score
-            num_points = self.conf.max_num_keypoints
-            if num_points is not None and len(pred["keypoints"]) > num_points:
-                indices = torch.topk(pred["keypoint_scores"], num_points).indices
-                pred = {k: v[indices] for k, v in pred.items()}
-
-        return pred
-
-    def forward(self, data: dict) -> dict:
-        image = data["image"]
-        if image.shape[1] == 3:
-            image = rgb_to_grayscale(image)
-        device = image.device
-        image = image.cpu()
-        pred = []
-        for k in range(len(image)):
-            img = image[k]
-            if "image_size" in data.keys():
-                # avoid extracting points in padded areas
-                w, h = data["image_size"][k]
-                img = img[:, :h, :w]
-            p = self.extract_single_image(img)
-            pred.append(p)
-        pred = {k: torch.stack([p[k] for p in pred], 0).to(device) for k in pred[0]}
-        if self.conf.rootsift:
-            pred["descriptors"] = sift_to_rootsift(pred["descriptors"])
-        return pred

+ 0 - 227
python/LightGlue/lightglue/superpoint.py

@@ -1,227 +0,0 @@
-# %BANNER_BEGIN%
-# ---------------------------------------------------------------------
-# %COPYRIGHT_BEGIN%
-#
-#  Magic Leap, Inc. ("COMPANY") CONFIDENTIAL
-#
-#  Unpublished Copyright (c) 2020
-#  Magic Leap, Inc., All Rights Reserved.
-#
-# NOTICE:  All information contained herein is, and remains the property
-# of COMPANY. The intellectual and technical concepts contained herein
-# are proprietary to COMPANY and may be covered by U.S. and Foreign
-# Patents, patents in process, and are protected by trade secret or
-# copyright law.  Dissemination of this information or reproduction of
-# this material is strictly forbidden unless prior written permission is
-# obtained from COMPANY.  Access to the source code contained herein is
-# hereby forbidden to anyone except current COMPANY employees, managers
-# or contractors who have executed Confidentiality and Non-disclosure
-# agreements explicitly covering such access.
-#
-# The copyright notice above does not evidence any actual or intended
-# publication or disclosure  of  this source code, which includes
-# information that is confidential and/or proprietary, and is a trade
-# secret, of  COMPANY.   ANY REPRODUCTION, MODIFICATION, DISTRIBUTION,
-# PUBLIC  PERFORMANCE, OR PUBLIC DISPLAY OF OR THROUGH USE  OF THIS
-# SOURCE CODE  WITHOUT THE EXPRESS WRITTEN CONSENT OF COMPANY IS
-# STRICTLY PROHIBITED, AND IN VIOLATION OF APPLICABLE LAWS AND
-# INTERNATIONAL TREATIES.  THE RECEIPT OR POSSESSION OF  THIS SOURCE
-# CODE AND/OR RELATED INFORMATION DOES NOT CONVEY OR IMPLY ANY RIGHTS
-# TO REPRODUCE, DISCLOSE OR DISTRIBUTE ITS CONTENTS, OR TO MANUFACTURE,
-# USE, OR SELL ANYTHING THAT IT  MAY DESCRIBE, IN WHOLE OR IN PART.
-#
-# %COPYRIGHT_END%
-# ----------------------------------------------------------------------
-# %AUTHORS_BEGIN%
-#
-#  Originating Authors: Paul-Edouard Sarlin
-#
-# %AUTHORS_END%
-# --------------------------------------------------------------------*/
-# %BANNER_END%
-
-# Adapted by Remi Pautrat, Philipp Lindenberger
-
-import torch
-from kornia.color import rgb_to_grayscale
-from torch import nn
-
-from .utils import Extractor
-
-
-def simple_nms(scores, nms_radius: int):
-    """Fast Non-maximum suppression to remove nearby points"""
-    assert nms_radius >= 0
-
-    def max_pool(x):
-        return torch.nn.functional.max_pool2d(
-            x, kernel_size=nms_radius * 2 + 1, stride=1, padding=nms_radius
-        )
-
-    zeros = torch.zeros_like(scores)
-    max_mask = scores == max_pool(scores)
-    for _ in range(2):
-        supp_mask = max_pool(max_mask.float()) > 0
-        supp_scores = torch.where(supp_mask, zeros, scores)
-        new_max_mask = supp_scores == max_pool(supp_scores)
-        max_mask = max_mask | (new_max_mask & (~supp_mask))
-    return torch.where(max_mask, scores, zeros)
-
-
-def top_k_keypoints(keypoints, scores, k):
-    if k >= len(keypoints):
-        return keypoints, scores
-    scores, indices = torch.topk(scores, k, dim=0, sorted=True)
-    return keypoints[indices], scores
-
-
-def sample_descriptors(keypoints, descriptors, s: int = 8):
-    """Interpolate descriptors at keypoint locations"""
-    b, c, h, w = descriptors.shape
-    keypoints = keypoints - s / 2 + 0.5
-    keypoints /= torch.tensor(
-        [(w * s - s / 2 - 0.5), (h * s - s / 2 - 0.5)],
-    ).to(
-        keypoints
-    )[None]
-    keypoints = keypoints * 2 - 1  # normalize to (-1, 1)
-    args = {"align_corners": True} if torch.__version__ >= "1.3" else {}
-    descriptors = torch.nn.functional.grid_sample(
-        descriptors, keypoints.view(b, 1, -1, 2), mode="bilinear", **args
-    )
-    descriptors = torch.nn.functional.normalize(
-        descriptors.reshape(b, c, -1), p=2, dim=1
-    )
-    return descriptors
-
-
-class SuperPoint(Extractor):
-    """SuperPoint Convolutional Detector and Descriptor
-
-    SuperPoint: Self-Supervised Interest Point Detection and
-    Description. Daniel DeTone, Tomasz Malisiewicz, and Andrew
-    Rabinovich. In CVPRW, 2019. https://arxiv.org/abs/1712.07629
-
-    """
-
-    default_conf = {
-        "descriptor_dim": 256,
-        "nms_radius": 4,
-        "max_num_keypoints": None,
-        "detection_threshold": 0.0005,
-        "remove_borders": 4,
-    }
-
-    preprocess_conf = {
-        "resize": 1024,
-    }
-
-    required_data_keys = ["image"]
-
-    def __init__(self, **conf):
-        super().__init__(**conf)  # Update with default configuration.
-        self.relu = nn.ReLU(inplace=True)
-        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
-        c1, c2, c3, c4, c5 = 64, 64, 128, 128, 256
-
-        self.conv1a = nn.Conv2d(1, c1, kernel_size=3, stride=1, padding=1)
-        self.conv1b = nn.Conv2d(c1, c1, kernel_size=3, stride=1, padding=1)
-        self.conv2a = nn.Conv2d(c1, c2, kernel_size=3, stride=1, padding=1)
-        self.conv2b = nn.Conv2d(c2, c2, kernel_size=3, stride=1, padding=1)
-        self.conv3a = nn.Conv2d(c2, c3, kernel_size=3, stride=1, padding=1)
-        self.conv3b = nn.Conv2d(c3, c3, kernel_size=3, stride=1, padding=1)
-        self.conv4a = nn.Conv2d(c3, c4, kernel_size=3, stride=1, padding=1)
-        self.conv4b = nn.Conv2d(c4, c4, kernel_size=3, stride=1, padding=1)
-
-        self.convPa = nn.Conv2d(c4, c5, kernel_size=3, stride=1, padding=1)
-        self.convPb = nn.Conv2d(c5, 65, kernel_size=1, stride=1, padding=0)
-
-        self.convDa = nn.Conv2d(c4, c5, kernel_size=3, stride=1, padding=1)
-        self.convDb = nn.Conv2d(
-            c5, self.conf.descriptor_dim, kernel_size=1, stride=1, padding=0
-        )
-
-        url = "https://github.com/cvg/LightGlue/releases/download/v0.1_arxiv/superpoint_v1.pth"  # noqa
-        self.load_state_dict(torch.hub.load_state_dict_from_url(url))
-
-        if self.conf.max_num_keypoints is not None and self.conf.max_num_keypoints <= 0:
-            raise ValueError("max_num_keypoints must be positive or None")
-
-    def forward(self, data: dict) -> dict:
-        """Compute keypoints, scores, descriptors for image"""
-        for key in self.required_data_keys:
-            assert key in data, f"Missing key {key} in data"
-        image = data["image"]
-        if image.shape[1] == 3:
-            image = rgb_to_grayscale(image)
-
-        # Shared Encoder
-        x = self.relu(self.conv1a(image))
-        x = self.relu(self.conv1b(x))
-        x = self.pool(x)
-        x = self.relu(self.conv2a(x))
-        x = self.relu(self.conv2b(x))
-        x = self.pool(x)
-        x = self.relu(self.conv3a(x))
-        x = self.relu(self.conv3b(x))
-        x = self.pool(x)
-        x = self.relu(self.conv4a(x))
-        x = self.relu(self.conv4b(x))
-
-        # Compute the dense keypoint scores
-        cPa = self.relu(self.convPa(x))
-        scores = self.convPb(cPa)
-        scores = torch.nn.functional.softmax(scores, 1)[:, :-1]
-        b, _, h, w = scores.shape
-        scores = scores.permute(0, 2, 3, 1).reshape(b, h, w, 8, 8)
-        scores = scores.permute(0, 1, 3, 2, 4).reshape(b, h * 8, w * 8)
-        scores = simple_nms(scores, self.conf.nms_radius)
-
-        # Discard keypoints near the image borders
-        if self.conf.remove_borders:
-            pad = self.conf.remove_borders
-            scores[:, :pad] = -1
-            scores[:, :, :pad] = -1
-            scores[:, -pad:] = -1
-            scores[:, :, -pad:] = -1
-
-        # Extract keypoints
-        best_kp = torch.where(scores > self.conf.detection_threshold)
-        scores = scores[best_kp]
-
-        # Separate into batches
-        keypoints = [
-            torch.stack(best_kp[1:3], dim=-1)[best_kp[0] == i] for i in range(b)
-        ]
-        scores = [scores[best_kp[0] == i] for i in range(b)]
-
-        # Keep the k keypoints with highest score
-        if self.conf.max_num_keypoints is not None:
-            keypoints, scores = list(
-                zip(
-                    *[
-                        top_k_keypoints(k, s, self.conf.max_num_keypoints)
-                        for k, s in zip(keypoints, scores)
-                    ]
-                )
-            )
-
-        # Convert (h, w) to (x, y)
-        keypoints = [torch.flip(k, [1]).float() for k in keypoints]
-
-        # Compute the dense descriptors
-        cDa = self.relu(self.convDa(x))
-        descriptors = self.convDb(cDa)
-        descriptors = torch.nn.functional.normalize(descriptors, p=2, dim=1)
-
-        # Extract descriptors
-        descriptors = [
-            sample_descriptors(k[None], d[None], 8)[0]
-            for k, d in zip(keypoints, descriptors)
-        ]
-
-        return {
-            "keypoints": torch.stack(keypoints, 0),
-            "keypoint_scores": torch.stack(scores, 0),
-            "descriptors": torch.stack(descriptors, 0).transpose(-1, -2).contiguous(),
-        }

+ 0 - 165
python/LightGlue/lightglue/utils.py

@@ -1,165 +0,0 @@
-import collections.abc as collections
-from pathlib import Path
-from types import SimpleNamespace
-from typing import Callable, List, Optional, Tuple, Union
-
-import cv2
-import kornia
-import numpy as np
-import torch
-
-
-class ImagePreprocessor:
-    default_conf = {
-        "resize": None,  # target edge length, None for no resizing
-        "side": "long",
-        "interpolation": "bilinear",
-        "align_corners": None,
-        "antialias": True,
-    }
-
-    def __init__(self, **conf) -> None:
-        super().__init__()
-        self.conf = {**self.default_conf, **conf}
-        self.conf = SimpleNamespace(**self.conf)
-
-    def __call__(self, img: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
-        """Resize and preprocess an image, return image and resize scale"""
-        h, w = img.shape[-2:]
-        if self.conf.resize is not None:
-            img = kornia.geometry.transform.resize(
-                img,
-                self.conf.resize,
-                side=self.conf.side,
-                antialias=self.conf.antialias,
-                align_corners=self.conf.align_corners,
-            )
-        scale = torch.Tensor([img.shape[-1] / w, img.shape[-2] / h]).to(img)
-        return img, scale
-
-
-def map_tensor(input_, func: Callable):
-    string_classes = (str, bytes)
-    if isinstance(input_, string_classes):
-        return input_
-    elif isinstance(input_, collections.Mapping):
-        return {k: map_tensor(sample, func) for k, sample in input_.items()}
-    elif isinstance(input_, collections.Sequence):
-        return [map_tensor(sample, func) for sample in input_]
-    elif isinstance(input_, torch.Tensor):
-        return func(input_)
-    else:
-        return input_
-
-
-def batch_to_device(batch: dict, device: str = "cpu", non_blocking: bool = True):
-    """Move batch (dict) to device"""
-
-    def _func(tensor):
-        return tensor.to(device=device, non_blocking=non_blocking).detach()
-
-    return map_tensor(batch, _func)
-
-
-def rbd(data: dict) -> dict:
-    """Remove batch dimension from elements in data"""
-    return {
-        k: v[0] if isinstance(v, (torch.Tensor, np.ndarray, list)) else v
-        for k, v in data.items()
-    }
-
-
-def read_image(path: Path, grayscale: bool = False) -> np.ndarray:
-    """Read an image from path as RGB or grayscale"""
-    if not Path(path).exists():
-        raise FileNotFoundError(f"No image at path {path}.")
-    mode = cv2.IMREAD_GRAYSCALE if grayscale else cv2.IMREAD_COLOR
-    image = cv2.imread(str(path), mode)
-    if image is None:
-        raise IOError(f"Could not read image at {path}.")
-    if not grayscale:
-        image = image[..., ::-1]
-    return image
-
-
-def numpy_image_to_torch(image: np.ndarray) -> torch.Tensor:
-    """Normalize the image tensor and reorder the dimensions."""
-    if image.ndim == 3:
-        image = image.transpose((2, 0, 1))  # HxWxC to CxHxW
-    elif image.ndim == 2:
-        image = image[None]  # add channel axis
-    else:
-        raise ValueError(f"Not an image: {image.shape}")
-    return torch.tensor(image / 255.0, dtype=torch.float)
-
-
-def resize_image(
-    image: np.ndarray,
-    size: Union[List[int], int],
-    fn: str = "max",
-    interp: Optional[str] = "area",
-) -> np.ndarray:
-    """Resize an image to a fixed size, or according to max or min edge."""
-    h, w = image.shape[:2]
-
-    fn = {"max": max, "min": min}[fn]
-    if isinstance(size, int):
-        scale = size / fn(h, w)
-        h_new, w_new = int(round(h * scale)), int(round(w * scale))
-        scale = (w_new / w, h_new / h)
-    elif isinstance(size, (tuple, list)):
-        h_new, w_new = size
-        scale = (w_new / w, h_new / h)
-    else:
-        raise ValueError(f"Incorrect new size: {size}")
-    mode = {
-        "linear": cv2.INTER_LINEAR,
-        "cubic": cv2.INTER_CUBIC,
-        "nearest": cv2.INTER_NEAREST,
-        "area": cv2.INTER_AREA,
-    }[interp]
-    return cv2.resize(image, (w_new, h_new), interpolation=mode), scale
-
-
-def load_image(path: Path, resize: int = None, **kwargs) -> torch.Tensor:
-    image = read_image(path)
-    if resize is not None:
-        image, _ = resize_image(image, resize, **kwargs)
-    return numpy_image_to_torch(image)
-
-
-class Extractor(torch.nn.Module):
-    def __init__(self, **conf):
-        super().__init__()
-        self.conf = SimpleNamespace(**{**self.default_conf, **conf})
-
-    @torch.no_grad()
-    def extract(self, img: torch.Tensor, **conf) -> dict:
-        """Perform extraction with online resizing"""
-        if img.dim() == 3:
-            img = img[None]  # add batch dim
-        assert img.dim() == 4 and img.shape[0] == 1
-        shape = img.shape[-2:][::-1]
-        img, scales = ImagePreprocessor(**{**self.preprocess_conf, **conf})(img)
-        feats = self.forward({"image": img})
-        feats["image_size"] = torch.tensor(shape)[None].to(img).float()
-        feats["keypoints"] = (feats["keypoints"] + 0.5) / scales[None] - 0.5
-        return feats
-
-
-def match_pair(
-    extractor,
-    matcher,
-    image0: torch.Tensor,
-    image1: torch.Tensor,
-    device: str = "cpu",
-    **preprocess,
-):
-    """Match a pair of images (image0, image1) with an extractor and matcher"""
-    feats0 = extractor.extract(image0, **preprocess)
-    feats1 = extractor.extract(image1, **preprocess)
-    matches01 = matcher({"image0": feats0, "image1": feats1})
-    data = [feats0, feats1, matches01]
-    # remove batch dim and move to target device
-    feats0, feats1, matches01 = [batch_to_device(rbd(x), device) for x in data]
-    return feats0, feats1, matches01

+ 0 - 203
python/LightGlue/lightglue/viz2d.py

@@ -1,203 +0,0 @@
-"""
-2D visualization primitives based on Matplotlib.
-1) Plot images with `plot_images`.
-2) Call `plot_keypoints` or `plot_matches` any number of times.
-3) Optionally: save a .png or .pdf plot (nice in papers!) with `save_plot`.
-"""
-
-import matplotlib
-import matplotlib.patheffects as path_effects
-import matplotlib.pyplot as plt
-import numpy as np
-import torch
-
-
-def cm_RdGn(x):
-    """Custom colormap: red (0) -> yellow (0.5) -> green (1)."""
-    x = np.clip(x, 0, 1)[..., None] * 2
-    c = x * np.array([[0, 1.0, 0]]) + (2 - x) * np.array([[1.0, 0, 0]])
-    return np.clip(c, 0, 1)
-
-
-def cm_BlRdGn(x_):
-    """Custom colormap: blue (-1) -> red (0.0) -> green (1)."""
-    x = np.clip(x_, 0, 1)[..., None] * 2
-    c = x * np.array([[0, 1.0, 0, 1.0]]) + (2 - x) * np.array([[1.0, 0, 0, 1.0]])
-
-    xn = -np.clip(x_, -1, 0)[..., None] * 2
-    cn = xn * np.array([[0, 0.1, 1, 1.0]]) + (2 - xn) * np.array([[1.0, 0, 0, 1.0]])
-    out = np.clip(np.where(x_[..., None] < 0, cn, c), 0, 1)
-    return out
-
-
-def cm_prune(x_):
-    """Custom colormap to visualize pruning"""
-    if isinstance(x_, torch.Tensor):
-        x_ = x_.cpu().numpy()
-    max_i = max(x_)
-    norm_x = np.where(x_ == max_i, -1, (x_ - 1) / 9)
-    return cm_BlRdGn(norm_x)
-
-
-def cm_grad2d(xy):
-    """2D grad. colormap: yellow (0, 0) -> green (1, 0) -> red (0, 1) -> blue (1, 1)."""
-    tl = np.array([1.0, 0, 0])  # red
-    tr = np.array([0, 0.0, 1])  # blue
-    ll = np.array([1.0, 1.0, 0])  # yellow
-    lr = np.array([0, 1.0, 0])  # green
-
-    xy = np.clip(xy, 0, 1)
-    x = xy[..., :1]
-    y = xy[..., -1:]
-    rgb = (1 - x) * (1 - y) * ll + x * (1 - y) * lr + x * y * tr + (1 - x) * y * tl
-    return rgb.clip(0, 1)
-
-
-def plot_images(imgs, titles=None, cmaps="gray", dpi=100, pad=0.5, adaptive=True):
-    """Plot a set of images horizontally.
-    Args:
-        imgs: list of NumPy RGB (H, W, 3) or PyTorch RGB (3, H, W) or mono (H, W).
-        titles: a list of strings, as titles for each image.
-        cmaps: colormaps for monochrome images.
-        adaptive: whether the figure size should fit the image aspect ratios.
-    """
-    # conversion to (H, W, 3) for torch.Tensor
-    imgs = [
-        (
-            img.permute(1, 2, 0).cpu().numpy()
-            if (isinstance(img, torch.Tensor) and img.dim() == 3)
-            else img
-        )
-        for img in imgs
-    ]
-
-    n = len(imgs)
-    if not isinstance(cmaps, (list, tuple)):
-        cmaps = [cmaps] * n
-
-    if adaptive:
-        ratios = [i.shape[1] / i.shape[0] for i in imgs]  # W / H
-    else:
-        ratios = [4 / 3] * n
-    figsize = [sum(ratios) * 4.5, 4.5]
-    fig, ax = plt.subplots(
-        1, n, figsize=figsize, dpi=dpi, gridspec_kw={"width_ratios": ratios}
-    )
-    if n == 1:
-        ax = [ax]
-    for i in range(n):
-        ax[i].imshow(imgs[i], cmap=plt.get_cmap(cmaps[i]))
-        ax[i].get_yaxis().set_ticks([])
-        ax[i].get_xaxis().set_ticks([])
-        ax[i].set_axis_off()
-        for spine in ax[i].spines.values():  # remove frame
-            spine.set_visible(False)
-        if titles:
-            ax[i].set_title(titles[i])
-    fig.tight_layout(pad=pad)
-
-
-def plot_keypoints(kpts, colors="lime", ps=4, axes=None, a=1.0):
-    """Plot keypoints for existing images.
-    Args:
-        kpts: list of ndarrays of size (N, 2).
-        colors: string, or list of list of tuples (one for each keypoints).
-        ps: size of the keypoints as float.
-    """
-    if not isinstance(colors, list):
-        colors = [colors] * len(kpts)
-    if not isinstance(a, list):
-        a = [a] * len(kpts)
-    if axes is None:
-        axes = plt.gcf().axes
-    for ax, k, c, alpha in zip(axes, kpts, colors, a):
-        if isinstance(k, torch.Tensor):
-            k = k.cpu().numpy()
-        ax.scatter(k[:, 0], k[:, 1], c=c, s=ps, linewidths=0, alpha=alpha)
-
-
-def plot_matches(kpts0, kpts1, color=None, lw=1.5, ps=4, a=1.0, labels=None, axes=None):
-    """Plot matches for a pair of existing images.
-    Args:
-        kpts0, kpts1: corresponding keypoints of size (N, 2).
-        color: color of each match, string or RGB tuple. Random if not given.
-        lw: width of the lines.
-        ps: size of the end points (no endpoint if ps=0)
-        indices: indices of the images to draw the matches on.
-        a: alpha opacity of the match lines.
-    """
-    fig = plt.gcf()
-    if axes is None:
-        ax = fig.axes
-        ax0, ax1 = ax[0], ax[1]
-    else:
-        ax0, ax1 = axes
-    if isinstance(kpts0, torch.Tensor):
-        kpts0 = kpts0.cpu().numpy()
-    if isinstance(kpts1, torch.Tensor):
-        kpts1 = kpts1.cpu().numpy()
-    assert len(kpts0) == len(kpts1)
-    if color is None:
-        kpts_norm = (kpts0 - kpts0.min(axis=0, keepdims=True)) / np.ptp(
-            kpts0, axis=0, keepdims=True
-        )
-        color = cm_grad2d(kpts_norm)  # gradient color
-    elif len(color) > 0 and not isinstance(color[0], (tuple, list)):
-        color = [color] * len(kpts0)
-
-    if lw > 0:
-        for i in range(len(kpts0)):
-            line = matplotlib.patches.ConnectionPatch(
-                xyA=(kpts0[i, 0], kpts0[i, 1]),
-                xyB=(kpts1[i, 0], kpts1[i, 1]),
-                coordsA=ax0.transData,
-                coordsB=ax1.transData,
-                axesA=ax0,
-                axesB=ax1,
-                zorder=1,
-                color=color[i],
-                linewidth=lw,
-                clip_on=True,
-                alpha=a,
-                label=None if labels is None else labels[i],
-                picker=5.0,
-            )
-            line.set_annotation_clip(True)
-            fig.add_artist(line)
-
-    # freeze the axes to prevent the transform to change
-    ax0.autoscale(enable=False)
-    ax1.autoscale(enable=False)
-
-    if ps > 0:
-        ax0.scatter(kpts0[:, 0], kpts0[:, 1], c=color, s=ps)
-        ax1.scatter(kpts1[:, 0], kpts1[:, 1], c=color, s=ps)
-
-
-def add_text(
-    idx,
-    text,
-    pos=(0.01, 0.99),
-    fs=15,
-    color="w",
-    lcolor="k",
-    lwidth=2,
-    ha="left",
-    va="top",
-):
-    ax = plt.gcf().axes[idx]
-    t = ax.text(
-        *pos, text, fontsize=fs, ha=ha, va=va, color=color, transform=ax.transAxes
-    )
-    if lcolor is not None:
-        t.set_path_effects(
-            [
-                path_effects.Stroke(linewidth=lwidth, foreground=lcolor),
-                path_effects.Normal(),
-            ]
-        )
-
-
-def save_plot(path, **kw):
-    """Save the current figure without any white margin."""
-    plt.savefig(path, bbox_inches="tight", pad_inches=0, **kw)

+ 0 - 30
python/LightGlue/pyproject.toml

@@ -1,30 +0,0 @@
-[project]
-name = "lightglue"
-description = "LightGlue: Local Feature Matching at Light Speed"
-version = "0.0"
-authors = [
-    {name = "Philipp Lindenberger"},
-    {name = "Paul-Edouard Sarlin"},
-]
-readme = "README.md"
-requires-python = ">=3.6"
-license = {file = "LICENSE"}
-classifiers = [
-    "Programming Language :: Python :: 3",
-    "License :: OSI Approved :: Apache Software License",
-    "Operating System :: OS Independent",
-]
-urls = {Repository = "https://github.com/cvg/LightGlue/"}
-dynamic = ["dependencies"]
-
-[project.optional-dependencies]
-dev = ["black==23.12.1", "flake8", "isort"]
-
-[tool.setuptools]
-packages = ["lightglue"]
-
-[tool.setuptools.dynamic]
-dependencies = {file = ["requirements.txt"]}
-
-[tool.isort]
-profile = "black"

+ 19 - 0
python/RapidOCR/.gitattributes

@@ -0,0 +1,19 @@
+# Set the default behavior, in case people don't have core.autocrlf set.
+* text=auto
+
+# Explicitly declare text files you want to always be normalized and converted
+# to native line endings on checkout.
+*.c text
+*.h text
+*.py text
+*.md text
+*.js text
+*.cpp text
+
+# Declare files that will always have CRLF line endings on checkout.
+*.sln text eol=crlf
+
+# Denote all files that are truly binary and should not be modified.
+*.png binary
+*.jpg binary
+*.pdf binary

+ 13 - 0
python/RapidOCR/.github/FUNDING.yml

@@ -0,0 +1,13 @@
+# These are supported funding model platforms
+
+github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
+patreon: # Replace with a single Patreon username
+open_collective: # Replace with a single Open Collective username
+ko_fi: # Replace with a single Ko-fi username
+tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
+community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
+liberapay: # Replace with a single Liberapay username
+issuehunt: # Replace with a single IssueHunt username
+otechie: # Replace with a single Otechie username
+lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
+custom: https://rapidai.github.io/RapidOCRDocs/main/sponsor/

+ 21 - 0
python/RapidOCR/.github/ISSUE_TEMPLATE/bug.md

@@ -0,0 +1,21 @@
+---
+name: 🐞 Bug
+about: Bug
+title: 'Bug'
+labels: 'Bug'
+assignees: ''
+
+---
+
+#### 问题描述 / Problem Description
+
+
+#### 运行环境 / Runtime Environment
+
+
+#### 复现代码 / Reproduction Code
+```python
+
+```
+
+#### 可能解决方案 / Possible solutions

+ 11 - 0
python/RapidOCR/.github/ISSUE_TEMPLATE/config.yml

@@ -0,0 +1,11 @@
+blank_issues_enabled: false
+contact_links:
+  - name: ❓ Questions
+    url: https://github.com/RapidAI/RapidOCR/discussions/categories/q-a
+    about: Please use the community forum for help and questions regarding RapidOCR.
+  - name: 💡 Feature requests and ideas
+    url: https://github.com/RapidAI/RapidOCR/discussions/categories/ideas
+    about: Please vote for and post new feature ideas in the community forum.
+  - name: 📖 Documentation
+    url: https://rapidai.github.io/RapidOCRDocs
+    about: A great place to find instructions and answers about RapidOCR.

+ 35 - 0
python/RapidOCR/.github/workflows/SyncToGitee.yml

@@ -0,0 +1,35 @@
+name: SyncToGitee
+on:
+  push:
+    branches:
+      - main
+jobs:
+  repo-sync:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout source codes
+        uses: actions/checkout@v4
+
+      - name: Mirror the Github organization repos to Gitee.
+        uses: Yikun/hub-mirror-action@v1.4
+        with:
+          src: 'github/RapidAI'
+          dst: 'gitee/RapidAI'
+          dst_key: ${{ secrets.GITEE_PRIVATE_KEY }}
+          dst_token:  ${{ secrets.GITEE_TOKEN }}
+          force_update: true
+          # only sync this repo
+          static_list: "RapidOCR"
+          debug: true
+
+      - name: Mirror the Github organization repos to Gitee.
+        uses: Yikun/hub-mirror-action@v1.4
+        with:
+          src: 'github/RapidAI'
+          dst: 'gitee/openKylin'
+          dst_key: ${{ secrets.GITEE_PRIVATE_KEY }}
+          dst_token:  ${{ secrets.GITEE_TOKEN }}
+          force_update: true
+          # only sync this repo
+          static_list: "RapidOCR"
+          debug: true

+ 78 - 0
python/RapidOCR/.github/workflows/gen_whl_to_pypi_rapidocr.yml

@@ -0,0 +1,78 @@
+name: Push rapidocr to pypi
+
+on:
+  push:
+    tags:
+      - v*
+
+env:
+  RESOURCES_URL: https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/required_for_whl_v3.0.0.zip
+
+jobs:
+  UnitTesting:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Pull latest code
+        uses: actions/checkout@v4
+
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.10'
+          architecture: 'x64'
+
+      - name: Display Python version
+        run: python -c "import sys; print(sys.version)"
+
+      - name: Unit testings
+        run: |
+          cd python
+          pip install -r requirements.txt
+          pip install pytest wheel get_pypi_latest_version openvino==2023.3.0 onnxruntime python-bidi MNN
+          pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+          python -m pip install paddlepaddle==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+
+          pytest tests/test_*.py
+
+  GenerateWHL_PushPyPi:
+    needs: UnitTesting
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.10'
+          architecture: 'x64'
+
+      - name: Download models
+        run: |
+          cd python
+          wget $RESOURCES_URL
+          ZIP_NAME=${RESOURCES_URL##*/}
+          DIR_NAME=${ZIP_NAME%.*}
+          unzip $ZIP_NAME
+          cp $DIR_NAME/resources/models/*.* rapidocr/models
+
+      - name: Run setup.py
+        run: |
+          cd python
+          pip install setuptools get_pypi_latest_version wheel
+          mkdir rapidocr_t
+          mv rapidocr rapidocr_t
+          mv rapidocr_t rapidocr
+          cd rapidocr
+          echo "from .rapidocr.main import RapidOCR, VisRes" > __init__.py
+
+          cd ..
+          python -m pip install --upgrade pip
+          python setup.py bdist_wheel ${{ github.ref_name }}
+          mv dist ../
+
+      - name: Publish distribution 📦 to PyPI
+        uses: pypa/gh-action-pypi-publish@v1.5.0
+        with:
+          password: ${{ secrets.RAPIDOCR }}
+          packages_dir: dist/

+ 41 - 0
python/RapidOCR/.github/workflows/push_discord.yml

@@ -0,0 +1,41 @@
+name: discord message
+on:
+  release:
+    types: [published]
+
+jobs:
+  notify:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Prepare Discord message
+        id: prepare_message
+        run: |
+          full_msg="🚀 **New Release!** ${{ github.event.release.name }}
+
+          **Tag:** ${{ github.event.release.tag_name }}
+          **Author:** ${{ github.event.release.author.login }}
+
+          **Release Notes:**
+          ${{ github.event.release.body }}"
+
+              if [ ${#full_msg} -gt 1990 ]; then
+                truncated_msg="${full_msg:0:1987}..."
+              else
+                truncated_msg="$full_msg"
+              fi
+
+              echo "message<<EOF" >> "$GITHUB_OUTPUT"
+              echo "$truncated_msg" >> "$GITHUB_OUTPUT"
+              echo "EOF" >> "$GITHUB_OUTPUT"
+
+      - name: Send to Discord
+        env:
+          DISCORD_WEBHOOK: ${{ secrets.DISCORD_WEBHOOK_URL }}
+          DISCORD_USERNAME: Github Actions
+          DISCORD_AVATAR: https://cdn.discordapp.com/avatars/1460099944252702846/e57fd67dc7ca0cc840a0e87a82281bc5.webp?size=80
+        uses: Ilshidur/action-discord@0.4.0
+        with:
+          args: ${{ steps.prepare_message.outputs.message }}

+ 32 - 33
python/onnxocr/.gitignore → python/RapidOCR/.gitignore

@@ -1,7 +1,10 @@
+# Created by .ignore support plugin (hsz.mobi)
+### Python template
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
+.pytest_cache
 
 # C extensions
 *.so
@@ -20,6 +23,7 @@ parts/
 sdist/
 var/
 wheels/
+pip-wheel-metadata/
 share/python-wheels/
 *.egg-info/
 .installed.cfg
@@ -29,8 +33,9 @@ MANIFEST
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
+# *.manifest
+# *.spec
+*.res
 
 # Installer logs
 pip-log.txt
@@ -49,7 +54,6 @@ coverage.xml
 *.py,cover
 .hypothesis/
 .pytest_cache/
-cover/
 
 # Translations
 *.mo
@@ -72,7 +76,6 @@ instance/
 docs/_build/
 
 # PyBuilder
-.pybuilder/
 target/
 
 # Jupyter Notebook
@@ -83,9 +86,7 @@ profile_default/
 ipython_config.py
 
 # pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
+.python-version
 
 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
@@ -94,22 +95,7 @@ ipython_config.py
 #   install all needed dependencies.
 #Pipfile.lock
 
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-#   in version control.
-#   https://pdm.fming.dev/#use-with-ide
-.pdm.toml
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
 __pypackages__/
 
 # Celery stuff
@@ -146,15 +132,28 @@ dmypy.json
 # Pyre type checker
 .pyre/
 
-# pytype static type analyzer
-.pytype/
+#idea
+.vs
+.vscode
+.idea
+/images
+/models
 
-# Cython debug symbols
-cython_debug/
+#models
+*.onnx
 
-# PyCharm
-#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-#  and can be added to the global gitignore or merged into this file.  For a more nuclear
-#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
+*.ttf
+*.ttc
+
+long1.jpg
+
+*.bin
+*.mapping
+*.xml
+
+*.pdiparams
+*.pdiparams.info
+*.pdmodel
+
+.DS_Store
+*.npy

+ 18 - 0
python/RapidOCR/.pre-commit-config.yaml

@@ -0,0 +1,18 @@
+repos:
+- repo: https://gitee.com/SWHL/autoflake
+  rev: v2.1.1
+  hooks:
+    - id: autoflake
+      args:
+        [
+          "--recursive",
+          "--in-place",
+          "--remove-all-unused-imports",
+          "--ignore-init-module-imports",
+        ]
+      files: \.py$
+- repo: https://gitee.com/SWHL/black
+  rev: 23.1.0
+  hooks:
+    - id: black
+      files: \.py$

+ 3 - 1
python/onnxocr/LICENSE → python/RapidOCR/LICENSE

@@ -1,3 +1,5 @@
+Copyright (c) 2021 RapidOCR Authors. All rights reserved.
+
                                  Apache License
                            Version 2.0, January 2004
                         http://www.apache.org/licenses/
@@ -186,7 +188,7 @@
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
 
-   Copyright [yyyy] [name of copyright owner]
+   Copyright (c) 2021 RapidAI. All rights reserved.
 
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.

+ 182 - 0
python/RapidOCR/README-CN.md

@@ -0,0 +1,182 @@
+<div align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/Logov2_black.png"  width="60%" height="60%">
+    <source media="(prefers-color-scheme: light)" srcset="https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/Logov2_white.png"  width="60%" height="60%">
+    <img alt="Shows an illustrated sun in light mode and a moon with stars in dark mode." src="https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/Logov2_white.png">
+  </picture>
+
+<div>&nbsp;</div>
+<div align="center">
+    <b><font size="4"><i>信创级开源OCR - 为世界内容安全贡献力量</i></font></b>
+</div>
+<div>&nbsp;</div>
+
+<a href="https://huggingface.co/spaces/RapidAI/RapidOCRv3" target="_blank"><img src="https://img.shields.io/badge/%F0%9F%A4%97-Hugging Face Demo-blue"></a>
+<a href="https://www.modelscope.cn/studios/RapidAI/RapidOCRv3.0.0/summary" target="_blank"><img src="https://img.shields.io/badge/魔搭-Demo-blue"></a>
+<a href="https://colab.research.google.com/github/RapidAI/RapidOCR/blob/main/assets/RapidOCRDemo.ipynb" target="_blank"><img src="https://raw.githubusercontent.com/RapidAI/RapidOCR/main/assets/colab-badge.svg" alt="Open in Colab"></a>
+<a href=""><img src="https://img.shields.io/badge/Python->=3.6-aff.svg"></a>
+<a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Win%2C%20Mac-pink.svg"></a>
+<a href="https://github.com/RapidAI/RapidOCR/graphs/contributors"><img src="https://img.shields.io/github/contributors/RapidAI/RapidOCR?color=9ea"></a>
+<a href="https://pepy.tech/project/rapidocr"><img src="https://static.pepy.tech/personalized-badge/rapidocr?period=total&units=abbreviation&left_color=grey&right_color=blue&left_text=Downloads%20rapidocr"></a>
+<a href="https://pypi.org/project/rapidocr/"><img alt="PyPI" src="https://img.shields.io/pypi/v/rapidocr"></a>
+<a href="https://github.com/RapidAI/RapidOCR/stargazers"><img src="https://img.shields.io/github/stars/RapidAI/RapidOCR?color=ccf"></a>
+<a href="https://semver.org/"><img alt="SemVer2.0" src="https://img.shields.io/badge/SemVer-2.0-brightgreen"></a>
+<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
+
+简体中文 | [English](./README.md)
+
+</div>
+
+### 📝 简介
+
+RapidOCR 是一款完全开源免费、支持离线快速部署的多平台多语言 OCR 工具,以极致的速度与广泛的兼容性为核心优势。
+
+**支持语言:** 默认支持中英文识别。其他支持的语言,参见文档:[模型列表](https://rapidai.github.io/RapidOCRDocs/main/model_list/)
+
+**项目缘起:** 鉴于 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) 在工程化方面仍有优化空间,为简化并加速 OCR 模型在各类终端设备上的推理部署,我们创新性地将 PaddleOCR 中的模型转换为高度兼容的 ONNX 格式,并基于 Python, C++, Java, C# 等多种编程语言,实现了跨平台的无缝移植,让开发者能够轻松上手、高效集成。
+
+**名称寓意:** RapidOCR 这一名称承载着我们对产品的核心期待——轻快(操作简便、响应迅速)、好省(资源占用低、成本效益高)且智能(依托深度学习技术,实现精准高效的识别)。我们专注于发挥人工智能的优势,打造小巧而强大的模型,始终将速度作为不懈追求,同时确保卓越的识别效果。
+
+**使用指南:**
+
+- **直接部署**:若本仓库提供的模型已满足您的需求,只需参考 [快速开始](https://rapidai.github.io/RapidOCRDocs/main/quickstart/) 即可快速完成 RapidOCR 的部署与使用。
+- **定制化微调**:若现有模型无法满足特定场景需求,您可在 PaddleOCR 基础上,利用自有数据进行微调,再将优化后的模型应用于 RapidOCR 的部署流程,实现个性化定制。
+
+如果您觉得本项目对您的工作或学习有所帮助,恳请您不吝赐予一颗 ⭐ Star,给予我们宝贵的支持与鼓励!
+
+### 🎥 效果展示
+
+<div align="center">
+    <img src="https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/demo.gif" alt="Demo" width="100%" height="100%">
+</div>
+
+### 🛠️ 安装
+
+```bash
+pip install rapidocr onnxruntime
+```
+
+### 📋 使用
+
+```python
+from rapidocr import RapidOCR
+
+engine = RapidOCR()
+
+img_url = "https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/master/resources/test_files/ch_en_num.jpg"
+result = engine(img_url)
+print(result)
+
+result.vis("vis_result.jpg")
+```
+
+### 📚 文档
+
+完整文档请移步:[docs](https://rapidai.github.io/RapidOCRDocs)
+
+### 👥 谁在使用?([更多](https://github.com/RapidAI/RapidOCR/network/dependents))
+
+- [Docling](https://github.com/DS4SD/docling)
+- [CnOCR](https://github.com/breezedeus/CnOCR)
+- [api-for-open-llm](https://github.com/xusenlinzy/api-for-open-llm)
+- [arknights-mower](https://github.com/ArkMowers/arknights-mower)
+- [pensieve](https://github.com/arkohut/pensieve)
+- [genshin_artifact_auxiliary](https://github.com/SkeathyTomas/genshin_artifact_auxiliary)
+- [ChatLLM](https://github.com/yuanjie-ai/ChatLLM)
+- [langchain](https://github.com/langchain-ai/langchain)
+- [Langchain-Chatchat](https://github.com/chatchat-space/Langchain-Chatchat)
+- [JamAIBase](https://github.com/EmbeddedLLM/JamAIBase)
+- [PAI-RAG](https://github.com/aigc-apps/PAI-RAG)
+- [ChatAgent_RAG](https://github.com/junyuyang7/ChatAgent_RAG)
+- [OpenAdapt](https://github.com/OpenAdaptAI/OpenAdapt)
+- [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR)
+
+> 更多使用 RapidOCR 的项目,欢迎在 [登记地址](https://github.com/RapidAI/RapidOCR/discussions/286) 登记,登记仅仅为了产品推广。
+
+### 🙏 致谢
+
+- 非常感谢 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) 的一切。
+- 非常感谢 [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch) 提供转换后的 PyTorch 格式模型。
+- 非常感谢 [PaddleX](https://github.com/PaddlePaddle/PaddleX) 提供文档模型。
+- 非常感谢 [DeliciaLaniD](https://github.com/DeliciaLaniD) 修复 ocrweb 中扫描动画起始位置错位问题。
+- 非常感谢 [zhsunlight](https://github.com/zhsunlight) 关于参数化调用 GPU 推理的建议以及细致周到的测试。
+- 非常感谢 [lzh111222334](https://github.com/lzh111222334) 修复 python 版本下 rec 前处理部分 bug。
+- 非常感谢 [AutumnSun1996](https://github.com/AutumnSun1996) 在 [#42](https://github.com/RapidAI/RapidOCR/issues/42) 中的建议。
+- 非常感谢 [DeadWood8](https://github.com/DeadWood8) 提供了 [Nuitka 打包 rapidocr_web 的操作文档和可执行 exe](https://rapidai.github.io/RapidOCRDocs/main/install_usage/rapidocr_web/nuitka_package/)。
+- 非常感谢 [Loovelj](https://github.com/Loovelj) 指出对文本检测框排序时顺序问题,详情参见 [issue 75](https://github.com/RapidAI/RapidOCR/issues/75)。
+
+### 🤝 贡献指南
+
+本仓库为 RapidOCR 的 **Python** 部分,其他语言已迁移至独立仓库。
+
+参与 Python 开发的完整流程请参阅:[**Python CONTRIBUTING**](docs/CONTRIBUTING-CN.md)。
+
+### 🎖 贡献者
+
+<p align="left">
+  <a href="https://github.com/RapidAI/RapidOCR/graphs/contributors">
+    <img src="https://contrib.rocks/image?repo=RapidAI/RapidOCR&max=400&columns=10" width="60%"/>
+  </a>
+</p>
+
+### 🤝 加入我们
+
+请移步:[link](https://rapidai.github.io/RapidOCRDocs/main/communicate/)
+
+### 🌟 赞助商 & 支持者
+
+RapidOCR 是一个基于 Apache2.0 许可的开源项目,其持续开发之所以能够实现,完全得益于这些出色支持者的助力。如果您也想加入他们的行列,不妨考虑 [赞助 RapidOCR](<https://rapidai.github.io/RapidOCRDocs/main/sponsor/>) 的开发。
+
+#### 赞助商
+
+|Sponsors|Application|Introduction|
+|:---:|:---:|:---|
+|<img src="https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/Quicker.jpg" width=65 height=65  style="border-radius: 12px; border: none;"/>|[Quicker](https://getquicker.net/)|您的指尖工作箱|
+
+#### 支持者
+
+<table>
+  <tr>
+    <td align="center">
+      <a href="https://github.com/zg-young">
+        <img src="https://avatars.githubusercontent.com/u/56186938?v=4" width="60" style="border-radius: 12px; border: none;"/>
+      </a><br />
+      <sub><a href="https://github.com/zg-young">@zg-young</a></sub>
+    </td>
+    <td align="center">
+      <a href="https://github.com/Eunsolfs">
+        <img src="https://avatars.githubusercontent.com/u/53815751?v=4" width="60"  style="border-radius: 12px; border: none;"/>
+      </a><br />
+      <sub><a href="https://github.com/Eunsolfs">@Eunsolfs</a></sub>
+    </td>
+    <td align="center">
+      <a href="https://github.com/youzzhang">
+        <img src="https://avatars.githubusercontent.com/u/49047676?v=4" width="60" style="border-radius: 12px; border: none;"/>
+      </a><br />
+      <sub><a href="https://github.com/youzzhang">@youzzhang</a></sub>
+    </td>
+  </tr>
+</table>
+
+### 📜 引用
+
+如果您发现该项目对您的研究有用,请考虑引用:
+
+```bibtex
+@misc{RapidOCR 2021,
+    title={{Rapid OCR}: OCR Toolbox},
+    author={RapidAI Team},
+    howpublished = {\url{https://github.com/RapidAI/RapidOCR}},
+    year={2021}
+}
+```
+
+### ⭐️ Star history
+
+[![Stargazers over time](https://starchart.cc/RapidAI/RapidOCR.svg?variant=adaptive)](https://starchart.cc/RapidAI/RapidOCR)
+
+### ⚖️ 开源许可证
+
+OCR 模型版权归百度所有,其他工程代码版权归本仓库所有者所有。
+
+该项目采用 [Apache 2.0 license](../LICENSE) 开源许可证。

+ 179 - 0
python/RapidOCR/README.md

@@ -0,0 +1,179 @@
+<div align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/Logov2_black.png"  width="60%" height="60%">
+    <source media="(prefers-color-scheme: light)" srcset="https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/Logov2_white.png"  width="60%" height="60%">
+    <img alt="Shows an illustrated sun in light mode and a moon with stars in dark mode." src="https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/Logov2_white.png">
+  </picture>
+
+<div>&nbsp;</div>
+<div align="center">
+    <b><font size="4"><i>Open source OCR for the security of the digital world</i></font></b>
+</div>
+<div>&nbsp;</div>
+
+<a href="https://huggingface.co/spaces/RapidAI/RapidOCRv3" target="_blank"><img src="https://img.shields.io/badge/%F0%9F%A4%97-Hugging Face Demo-blue"></a>
+<a href="https://www.modelscope.cn/studios/RapidAI/RapidOCRv3.0.0/summary" target="_blank"><img src="https://img.shields.io/badge/魔搭-Demo-blue"></a>
+<a href="https://colab.research.google.com/github/RapidAI/RapidOCR/blob/main/assets/RapidOCRDemo.ipynb" target="_blank"><img src="https://raw.githubusercontent.com/RapidAI/RapidOCR/main/assets/colab-badge.svg" alt="Open in Colab"></a>
+<a href=""><img src="https://img.shields.io/badge/Python->=3.6-aff.svg"></a>
+<a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Win%2C%20Mac-pink.svg"></a>
+<a href="https://github.com/RapidAI/RapidOCR/graphs/contributors"><img src="https://img.shields.io/github/contributors/RapidAI/RapidOCR?color=9ea"></a>
+<a href="https://pepy.tech/project/rapidocr"><img src="https://static.pepy.tech/personalized-badge/rapidocr?period=total&units=abbreviation&left_color=grey&right_color=blue&left_text=Downloads%20rapidocr"></a>
+<a href="https://pypi.org/project/rapidocr/"><img alt="PyPI" src="https://img.shields.io/pypi/v/rapidocr"></a>
+<a href="https://github.com/RapidAI/RapidOCR/stargazers"><img src="https://img.shields.io/github/stars/RapidAI/RapidOCR?color=ccf"></a>
+<a href="https://semver.org/"><img alt="SemVer2.0" src="https://img.shields.io/badge/SemVer-2.0-brightgreen"></a>
+<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
+
+Join our [Discord](https://discord.gg/33eyQJq498)
+
+[简体中文](./README-CN.md) | English
+</div>
+
+### 📝 Introduction
+
+RapidOCR is a completely open-source, free OCR tool that supports multi-platform, multi-language operation and rapid offline deployment. Its core advantages lie in extreme speed and extensive compatibility.
+
+**Supported Languages:** Default support for Chinese and English recognition. For other supported languages, please refer to the documentation: [Model List](https://rapidai.github.io/RapidOCRDocs/main/model_list/).
+
+**Project Origin:** Considering that [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) still has room for optimization in engineering aspects, we innovatively converted the models in PaddleOCR into the highly compatible ONNX format to simplify and accelerate the inference deployment of OCR models on various terminal devices. Furthermore, we achieved seamless cross-platform porting based on multiple programming languages such as Python, C++, Java, and C#, enabling developers to get started easily and integrate efficiently.
+
+**Name Implication:** The name "RapidOCR" embodies our core expectations for the product: **Rapid** (simple operation, fast response), **Good & Economical** (low resource consumption, high cost-effectiveness), and **Intelligent** (achieving accurate and efficient recognition relying on deep learning technology). We focus on leveraging the advantages of artificial intelligence to create compact yet powerful models, relentlessly pursuing speed while ensuring excellent recognition results.
+
+**User Guide:**
+
+- **Direct Deployment:** If the models provided in this repository meet your needs, simply refer to the [Official Documentation](https://rapidai.github.io/RapidOCRDocs/main/quickstart/) to quickly complete the deployment and usage of RapidOCR.
+- **Custom Fine-tuning:** If the existing models cannot meet specific scenario requirements, you can fine-tune them using your own data based on PaddleOCR, and then apply the optimized models to the RapidOCR deployment process to achieve personalized customization.
+
+If you find this project helpful for your work or study, we kindly ask you to give us a ⭐ Star to provide valuable support and encouragement!
+
+### 🎥 Visualization
+
+<div align="center">
+    <img src="https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/demo.gif" alt="Demo" width="100%" height="100%">
+</div>
+
+### 🛠️ Installation
+
+```bash
+pip install rapidocr onnxruntime
+```
+
+### 📋 Usage
+
+```python
+from rapidocr import RapidOCR
+
+engine = RapidOCR()
+
+img_url = "https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/master/resources/test_files/ch_en_num.jpg"
+result = engine(img_url)
+print(result)
+
+result.vis("vis_result.jpg")
+```
+
+### 📚 Documentation
+
+Full documentation can be found on [docs](https://rapidai.github.io/RapidOCRDocs/), in Chinese.
+
+### 👥 Who use? ([more](https://github.com/RapidAI/RapidOCR/network/dependents))
+
+- [Docling](https://github.com/DS4SD/docling)
+- [CnOCR](https://github.com/breezedeus/CnOCR)
+- [api-for-open-llm](https://github.com/xusenlinzy/api-for-open-llm)
+- [arknights-mower](https://github.com/ArkMowers/arknights-mower)
+- [pensieve](https://github.com/arkohut/pensieve)
+- [genshin_artifact_auxiliary](https://github.com/SkeathyTomas/genshin_artifact_auxiliary)
+- [ChatLLM](https://github.com/yuanjie-ai/ChatLLM)
+- [langchain](https://github.com/langchain-ai/langchain)
+- [Langchain-Chatchat](https://github.com/chatchat-space/Langchain-Chatchat)
+- [JamAIBase](https://github.com/EmbeddedLLM/JamAIBase)
+- [PAI-RAG](https://github.com/aigc-apps/PAI-RAG)
+- [ChatAgent_RAG](https://github.com/junyuyang7/ChatAgent_RAG)
+- [OpenAdapt](https://github.com/OpenAdaptAI/OpenAdapt)
+- [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR)
+
+> For more projects that use RapidOCR, you are welcome to [register](https://github.com/RapidAI/RapidOCR/discussions/286) at the registration address. Registration is solely for product promotion.
+
+### 🙏 Acknowledgements
+
+- Many thanks to [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) for everything.
+- Many thanks to [PaddleOCR2Pytorch](https://github.com/frotms/PaddleOCR2Pytorch) for providing the converted PyTorch format models.
+- Many thanks to [PaddleX](https://github.com/PaddlePaddle/PaddleX) for providing the document models.
+- Many thanks to [DeliciaLaniD](https://github.com/DeliciaLaniD) for fixing the misplaced start position of scan animation in ocrweb.
+- Many thanks to [zhsunlight](https://github.com/zhsunlight) for the suggestion about parameterized call GPU reasoning and the careful and thoughtful testing.
+- Many thanks to [lzh111222334](https://github.com/lzh111222334) for fixing some bugs of rec preprocessing under python version.
+- Many thanks to [AutumnSun1996](https://github.com/AutumnSun1996) for the suggestion in the [#42](https://github.com/RapidAI/RapidOCR/issues/42).
+- Many thanks to [DeadWood8](https://github.com/DeadWood8) for providing the [document](https://rapidai.github.io/RapidOCRDocs/main/install_usage/rapidocr_web/nuitka_package/) which packages rapidocr_web to exe by Nuitka.
+- Many thanks to [Loovelj](https://github.com/Loovelj) for fixing the bug of sorting the text boxes. For details see [issue 75](https://github.com/RapidAI/RapidOCR/issues/75).
+
+### 🤝 Contribution Guide
+
+This repository contains the **Python** component of RapidOCR. Components for other languages have been migrated to separate repositories.
+
+For the complete workflow on contributing to Python development, please refer to: [**Python CONTRIBUTING**](docs/CONTRIBUTING.md).
+
+### 🎖 Code Contributors
+
+<p align="left">
+  <a href="https://github.com/RapidAI/RapidOCR/graphs/contributors">
+    <img src="https://contrib.rocks/image?repo=RapidAI/RapidOCR&max=400&columns=10" width="60%"/>
+  </a>
+</p>
+
+### 🌟 Sponsors & Backers
+
+RapidOCR is an Apache2.0-licensed open source project with its ongoing development made possible entirely by the support of these awesome backers. If you'd like to join them, please consider [sponsoring RapidOCR's development](https://rapidai.github.io/RapidOCRDocs/main/sponsor/).
+
+#### Sponsors
+
+|Sponsors|Application|Introduction|
+|:---:|:---:|:---|
+|<img src="https://github.com/RapidAI/RapidOCR/releases/download/v1.1.0/Quicker.jpg" width=65 height=65  style="border-radius: 12px; border: none;"/>|[Quicker](https://getquicker.net/)|Your fingertip toolbox|
+
+#### Backers
+
+<table>
+  <tr>
+    <td align="center">
+      <a href="https://github.com/zg-young">
+        <img src="https://avatars.githubusercontent.com/u/56186938?v=4" width="60" style="border-radius: 12px; border: none;"/>
+      </a><br />
+      <sub><a href="https://github.com/zg-young">@zg-young</a></sub>
+    </td>
+    <td align="center">
+      <a href="https://github.com/Eunsolfs">
+        <img src="https://avatars.githubusercontent.com/u/53815751?v=4" width="60"  style="border-radius: 12px; border: none;"/>
+      </a><br />
+      <sub><a href="https://github.com/Eunsolfs">@Eunsolfs</a></sub>
+    </td>
+    <td align="center">
+      <a href="https://github.com/youzzhang">
+        <img src="https://avatars.githubusercontent.com/u/49047676?v=4" width="60" style="border-radius: 12px; border: none;"/>
+      </a><br />
+      <sub><a href="https://github.com/youzzhang">@youzzhang</a></sub>
+    </td>
+  </tr>
+</table>
+
+### 📜 Citation
+
+If you find this project useful in your research, please consider cite:
+
+```bibtex
+@misc{RapidOCR 2021,
+    title={{Rapid OCR}: OCR Toolbox},
+    author={RapidAI Team},
+    howpublished = {\url{https://github.com/RapidAI/RapidOCR}},
+    year={2021}
+}
+```
+
+### ⭐️ Stargazers over time
+
+[![Stargazers over time](https://starchart.cc/RapidAI/RapidOCR.svg?variant=adaptive)](https://starchart.cc/RapidAI/RapidOCR)
+
+### ⚖️ License
+
+The copyright of the OCR model is held by Baidu, while the copyrights of all other engineering scripts are retained by the repository's owner.
+
+This project is released under the [Apache 2.0 license](./LICENSE).

+ 1 - 0
python/RapidOCR/android/README.md

@@ -0,0 +1 @@
+See [RapidOcrAndroidOnnx](https://github.com/RapidAI/RapidOcrAndroidOnnx) for details.

+ 3 - 0
python/RapidOCR/api/README.md

@@ -0,0 +1,3 @@
+### See [RapidOCRAPI](https://github.com/RapidAI/RapidOCRAPI) for details
+
+### See [Documentation](https://rapidai.github.io/RapidOCRDocs/main/install_usage/rapidocr_api/usage/)

Rozdílová data souboru nebyla zobrazena, protože soubor je příliš velký
+ 191 - 0
python/RapidOCR/assets/RapidOCRDemo.ipynb


binární
python/RapidOCR/assets/RapidOCR_LOGO.png


Rozdílová data souboru nebyla zobrazena, protože soubor je příliš velký
+ 0 - 0
python/RapidOCR/assets/colab-badge.svg


+ 96 - 0
python/RapidOCR/cliff.toml

@@ -0,0 +1,96 @@
+[changelog]
+body = """
+{% for group, commits in commits | group_by(attribute="group") %}
+    ### {{ group | striptags | trim | upper_first }}
+    {% for commit in commits
+    | filter(attribute="scope")
+    | sort(attribute="scope") %}
+        - **({{commit.scope}})**{% if commit.breaking %} [**breaking**]{% endif %} \
+            {{ commit.message }} by [@{{ commit.author.name }}](https://github.com/{{ commit.author.name }}) in [{{ commit.id | truncate(length=7, end="") }}]($REPO/commit/{{ commit.id }})
+    {%- endfor -%}
+    {% raw %}\n{% endraw %}\
+    {%- for commit in commits %}
+        {%- if commit.scope -%}
+        {% else -%}
+            - {% if commit.breaking %} [**breaking**]{% endif %}\
+                {{ commit.message }} by [@{{ commit.author.name }}](https://github.com/{{ commit.author.name }}) in [{{ commit.id | truncate(length=7, end="") }}]($REPO/commit/{{ commit.id }})
+        {% endif -%}
+    {% endfor -%}
+{% endfor %}
+
+
+{% if github.contributors | length > 0 %}
+### 🎉 Contributors
+
+{% for contributor in github.contributors %}
+  - [@{{ contributor.username }}](https://github.com/{{ contributor.username }})
+{%- endfor -%}
+{% endif %}
+
+
+{% if version %}
+    {% if previous.version %}\
+        **Full Changelog**:  [{{ version | trim_start_matches(pat="v") }}]($REPO/compare/{{ previous.version }}..{{ version }})
+    {% else %}\
+        **Full Changelog**:  [{{ version | trim_start_matches(pat="v") }}]
+    {% endif %}\
+{% else %}\
+    ## [unreleased]
+{% endif %}
+"""
+
+footer = """
+
+"""
+
+# Remove leading and trailing whitespaces from the changelog's body.
+trim = true
+postprocessors = [
+    # Replace the placeholder `<REPO>` with a URL.
+    { pattern = '\$REPO', replace = "https://github.com/RapidAI/RapidOCR" }, # replace repository URL
+    # 去掉每行末尾的无效空格
+    { pattern = "(?m)[ \t]+$", replace = "" },
+    # 将连续多个空行压缩为最多一个空行
+    { pattern = "\n{3,}", replace = "\n\n" },
+]
+
+[git]
+# Parse commits according to the conventional commits specification.
+# See https://www.conventionalcommits.org
+conventional_commits = true
+# Exclude commits that do not match the conventional commits specification.
+filter_unconventional = false
+# Split commits on newlines, treating each line as an individual commit.
+split_commits = false
+# An array of regex based parsers to modify commit messages prior to further processing.
+commit_preprocessors = [
+    # Replace issue numbers with link templates to be updated in `changelog.postprocessors`.
+    #{ pattern = '\((\w+\s)?#([0-9]+)\)', replace = "([#${2}](https://github.com/orhun/git-cliff/issues/${2}))"},
+]
+# An array of regex based parsers for extracting data from the commit message.
+# Assigns commits to groups.
+# Optionally sets the commit's scope and can decide to exclude commits from further processing.
+commit_parsers = [
+  { message = "^feat", group = "<!-- 0 -->🚀 Features" },
+  { message = "^fix", group = "<!-- 1 -->🐛 Bug Fixes" },
+  { message = "^doc", group = "<!-- 3 -->📚 Documentation" },
+  { message = "^perf", group = "<!-- 4 -->⚡ Performance" },
+  { message = "^refactor", group = "<!-- 2 -->🚜 Refactor" },
+  { message = "^style", group = "<!-- 5 -->🎨 Styling" },
+  { message = "^test", group = "<!-- 6 -->🧪 Testing" },
+  { message = "^chore\\(release\\): prepare for", skip = true },
+  { message = "^chore\\(deps.*\\)", skip = true },
+  { message = "^chore\\(pr\\)", skip = true },
+  { message = "^chore\\(pull\\)", skip = true },
+  { message = "^chore|^ci", group = "<!-- 7 -->⚙️ Miscellaneous Tasks" },
+  { body = ".*security", group = "<!-- 8 -->🛡️ Security" },
+  { message = "^revert", group = "<!-- 9 -->◀️ Revert" },
+  { message = ".*", group = "<!-- 10 -->💼 Other" },
+]
+# Exclude commits that are not matched by any commit parser.
+filter_commits = false
+# Order releases topologically instead of chronologically.
+topo_order = false
+# Order of commits in each group/release within the changelog.
+# Allowed values: newest, oldest
+sort_commits = "newest"

+ 3 - 0
python/RapidOCR/cpp/README.md

@@ -0,0 +1,3 @@
+See [RapidOcrNcnn](https://github.com/RapidAI/RapidOcrNcnn) for details.
+
+See [RapidOcrOnnx](https://github.com/RapidAI/RapidOcrOnnx) for details.

+ 25 - 0
python/RapidOCR/docker/README.md

@@ -0,0 +1,25 @@
+# RapidOCR API Docker
+
+## Quickstart
+
+```bash
+docker pull qingchen0607/rapid-ocr-api:v20250619
+docker run -itd --restart=always --name rapidocr_api -p 9005:9005 qingchen0607/rapid-ocr-api:v20250619 
+
+# http://<ip>:9005/docs
+```
+
+## Local Build
+
+```shell
+cd docker
+
+#chmod +x docker_build&run.sh docker_stop&clean.sh
+
+# build image and run
+./docker_build&run.sh
+
+# stop and rm image
+./docker_stop&clean.sh
+```
+

+ 2 - 0
python/RapidOCR/docker/docker_build&run.sh

@@ -0,0 +1,2 @@
+docker build -t rapidocr_api --network host .
+docker run -itd --restart=always --name rapidocr_api -p 9005:9005 rapidocr_api

+ 3 - 0
python/RapidOCR/docker/docker_stop&clean.sh

@@ -0,0 +1,3 @@
+docker stop rapidocr_api
+docker rm rapidocr_api
+docker rmi rapidocr_api

+ 8 - 0
python/RapidOCR/docker/dockerfile

@@ -0,0 +1,8 @@
+FROM python:3.10.11-slim-buster
+ENV DEBIAN_FRONTEND=noninteractive
+WORKDIR /app
+RUN pip install --no-cache-dir onnxruntime rapidocr_api -i https://mirrors.aliyun.com/pypi/simple
+RUN pip uninstall -y opencv-python && \
+    pip install --no-cache-dir opencv-python-headless -i https://mirrors.aliyun.com/pypi/simple
+EXPOSE 9005
+CMD ["bash", "-c", "rapidocr_api -ip 0.0.0.0 -p 9005 -workers 2"]

+ 250 - 0
python/RapidOCR/docs/CONTRIBUTING-CN.md

@@ -0,0 +1,250 @@
+# RapidOCR Python 贡献指南
+
+感谢你对 RapidOCR Python 部分的关注!本文档说明如何参与 Python 目录下的代码开发与贡献,包括环境准备、开发流程和提交流程。
+
+## 前置要求
+
+- Python >= 3.6(推荐 3.8+)
+- Git
+- 已注册的 GitHub 账号
+
+---
+
+## 一、克隆源码
+
+从 RapidOCR 主仓库克隆项目到本地:
+
+```bash
+git clone https://github.com/RapidAI/RapidOCR.git
+cd RapidOCR
+```
+
+若网络受限,可使用镜像或代理;也可先 fork 到个人账号后再克隆(见后文「准备提交」部分)。
+
+---
+
+## 二、进入 Python 目录并配置环境
+
+```bash
+cd python
+```
+
+建议使用虚拟环境,避免与系统 Python 冲突:
+
+```bash
+# 使用 venv
+python -m venv .venv
+source .venv/bin/activate   # Linux/macOS
+# .venv\Scripts\activate    # Windows
+
+# 或使用 conda
+conda create -n rapidocr python=3.10
+conda activate rapidocr
+```
+
+安装依赖(开发时建议可编辑安装以便本地修改生效):
+
+```bash
+pip install -r requirements.txt
+pip install pytest  # 运行单元测试需要
+# 可选:以可编辑模式安装当前包,便于调试
+pip install -e .
+```
+
+如需使用 ONNX Runtime 等推理后端,请按 [文档](https://rapidai.github.io/RapidOCRDocs/main/install_usage/rapidocr/install/) 安装对应依赖(如 `rapidocr_onnxruntime` 等)。
+
+---
+
+## 三、安装代码格式化与 pre-commit 钩子
+
+在 **开发者环境** 下安装 pre-commit,并启用 Git 提交前钩子,以便自动做代码格式检查与整理(如 black、autoflake 等):
+
+```bash
+# 在 python 目录下、已激活的虚拟环境中安装
+pip install pre-commit
+
+# 到仓库根目录 RapidOCR 下安装 Git 钩子(.pre-commit-config.yaml 在根目录)
+cd ..   # 若当前在 python 目录,先回到仓库根目录
+pre-commit install
+```
+
+安装成功后,每次执行 `git commit` 时会自动运行配置好的格式化工具;若检查未通过,提交会被拒绝,请根据提示修改后再次提交。也可在提交前手动跑一遍:
+
+```bash
+# 在仓库根目录执行
+pre-commit run --all-files
+```
+
+---
+
+## 四、运行单元测试
+
+在 **`python` 目录下** 执行:
+
+```bash
+# 运行全部测试
+pytest tests/ -v
+
+# 仅运行部分测试文件
+pytest tests/test_input.py -v
+pytest tests/test_det_cls_rec.py -v
+
+# 查看测试覆盖率(需先安装 pytest-cov)
+pytest tests/ -v --cov=rapidocr
+```
+
+确认当前主分支在你本机环境下测试通过,再进行修改。
+
+---
+
+## 五、复现问题 / 增加新功能
+
+### 复现 Bug
+
+1. 在 [Issues](https://github.com/RapidAI/RapidOCR/issues) 中选定或创建对应 issue。
+2. 根据 issue 描述与报错信息,在本地用 `python` 目录下的代码复现问题。
+3. 在 `rapidocr/` 或 `tests/` 下定位并修改代码,直到问题消失。
+
+### 增加新功能
+
+1. 与 maintainer 或现有 issue 讨论需求与实现方式(可选但推荐)。
+2. 在 `rapidocr/` 下实现新逻辑,保持与现有代码风格一致(项目使用 [black](https://github.com/psf/black) 等规范)。
+3. 新功能应有对应单元测试覆盖。
+
+---
+
+## 六、编写对应单元测试
+
+- 测试文件放在 **`python/tests/`** 下,命名建议 `test_*.py`。
+- 使用 **pytest** 编写用例,可参考现有 `test_input.py`、`test_det_cls_rec.py`、`test_cli.py` 等。
+- 测试用图片等资源放在 `tests/test_files/`。
+- 新增测试应:
+    - 能稳定复现你要验证的行为(Bug 修复或新功能);
+    - 不依赖未在仓库或文档中说明的外部服务(必要时用 mock 或跳过)。
+
+示例:
+
+```python
+# tests/test_xxx.py
+import pytest
+from pathlib import Path
+
+root_dir = Path(__file__).resolve().parent.parent
+tests_dir = root_dir / "tests" / "test_files"
+
+@pytest.fixture()
+def engine():
+    from rapidocr import RapidOCR
+    return RapidOCR()
+
+def test_your_new_feature(engine):
+    img_path = tests_dir / "ch_en_num.jpg"
+    result = engine(img_path)
+    assert result is not None
+    # 更多断言...
+```
+
+---
+
+## 七、运行所有单元测试
+
+在 **`python` 目录下** 再次全量跑测,确保无回归:
+
+```bash
+pytest tests/ -v
+```
+
+若有测试被跳过(如缺少某推理引擎),请确认你修改或新增的测试在现有环境下已执行并通过。
+
+---
+
+## 八、准备提交到仓库
+
+### 8.1 Fork RapidOCR 主仓库到个人账号
+
+1. 打开 [RapidOCR 主仓库](https://github.com/RapidAI/RapidOCR)。
+2. 点击右上角 **Fork**,将仓库 fork 到你自己的 GitHub 账号下(例如 `https://github.com/你的用户名/RapidOCR`)。
+
+### 8.2 将代码提交到个人 Fork
+
+若最初是克隆的主仓库,需要把远程改为你的 fork,并推送到 fork:
+
+```bash
+# 在项目根目录 RapidOCR 下执行
+git remote add myfork https://github.com/你的用户名/RapidOCR.git
+# 若已有 origin 且就是主仓库,可保留;推送时用 myfork
+
+# 创建分支(推荐为每个 issue/功能单独分支)
+git checkout -b fix/xxx   # 或 feat/xxx
+
+# 添加并提交你在 python 目录下的修改
+git add python/
+git status   # 确认只提交预期文件
+git commit -m "fix(python): 简短描述"
+
+# 推送到你的 fork
+git push myfork fix/xxx
+```
+
+**请按约定式提交规范(Conventional Commits)书写 commit 信息** ,便于维护者阅读与自动生成 Changelog。格式为:
+
+```text
+<类型>[可选范围]: <简短描述>
+
+[可选正文]
+[可选脚注]
+```
+
+常用类型示例:
+
+| 类型     | 说明         |
+|----------|--------------|
+| `feat`   | 新功能       |
+| `fix`    | Bug 修复     |
+| `docs`   | 文档变更     |
+| `style`  | 代码格式(不影响逻辑) |
+| `refactor` | 重构       |
+| `test`   | 测试相关     |
+| `chore`  | 构建/工具等  |
+
+示例:`fix(python): 修复某条件下识别结果为空`、`feat(python): 支持 xxx 输入格式`。
+
+### 8.3 向 RapidOCR 主仓库提交 Pull Request(PR)
+
+1. 打开你 fork 后的仓库页面(如 `https://github.com/你的用户名/RapidOCR`)。
+2. 若刚推送分支,页面上通常会出现 **Compare & pull request**,点击即可;否则在 **Branches** 里选择你刚推送的分支,再点 **New pull request** 。
+3. 确认 **base 仓库** 为 `RapidAI/RapidOCR`、**base 分支** 为 `main`(或仓库默认主分支),**head 仓库** 为你的 fork、**head 分支** 为你的分支(如 `fix/xxx`)。
+4. 填写 PR 标题和说明:
+   - 标题:简要概括修改内容(如「Fix: 修复 Python 下 xxx 问题」)。
+   - 说明中建议包含:
+     - 对应 Issue 编号(若有):`Fixes #123` 或 `Related to #123`。
+     - 修改原因与主要改动。
+     - 如何验证:例如「在 python 目录下执行 `pytest tests/ -v` 通过」。
+5. 提交 PR,等待 maintainer 审查;根据反馈再在本地修改并推送同一分支,PR 会自动更新。
+
+---
+
+## 流程小结
+
+| 步骤 | 说明 |
+|------|------|
+| 1 | 克隆 RapidOCR 源码 |
+| 2 | 进入 `python` 目录,配置虚拟环境并安装依赖与 pytest |
+| 3 | 安装 pre-commit(`pip install pre-commit`),在仓库根目录执行 `pre-commit install` 安装 Git 钩子 |
+| 4 | 运行单元测试,确认基线通过 |
+| 5 | 复现问题或实现新功能 |
+| 6 | 编写/补充对应单元测试 |
+| 7 | 在 `python` 目录下运行全部测试并确认通过 |
+| 8 | Fork 主仓库到个人账号 |
+| 9 | 按约定式提交规范编写 commit,将修改提交并推送到个人 Fork 的对应分支 |
+| 10 | 在主仓库创建 PR,从个人 Fork 分支指向主仓库 main |
+
+---
+
+## 其他说明
+
+- **代码风格** :项目采用 [black](https://github.com/psf/black)、autoflake 等规范,已通过 pre-commit 钩子在提交时自动检查;也可在仓库根目录执行 `pre-commit run --all-files` 手动跑一遍。
+- **文档** :更多安装与使用说明见 [RapidOCR 文档](https://rapidai.github.io/RapidOCRDocs/)。
+- **问题与讨论** :Bug 与功能建议可通过 [GitHub Issues](https://github.com/RapidAI/RapidOCR/issues) 反馈。
+
+再次感谢你的贡献!

+ 250 - 0
python/RapidOCR/docs/CONTRIBUTING.md

@@ -0,0 +1,250 @@
+# RapidOCR Python Contributing Guide
+
+Thanks for your interest in contributing to the RapidOCR Python codebase! This guide explains how to set up your environment, develop, and submit changes for the `python` directory, including running tests and opening pull requests.
+
+## Prerequisites
+
+- Python >= 3.6 (3.8+ recommended)
+- Git
+- A GitHub account
+
+---
+
+## 1. Clone the repository
+
+Clone the RapidOCR repository to your machine:
+
+```bash
+git clone https://github.com/RapidAI/RapidOCR.git
+cd RapidOCR
+```
+
+If you have network restrictions, use a mirror or proxy; you can also fork the repo to your account first and clone your fork (see “Preparing to submit” below).
+
+---
+
+## 2. Enter the Python directory and set up the environment
+
+```bash
+cd python
+```
+
+Use a virtual environment to avoid conflicts with the system Python:
+
+```bash
+# Using venv
+python -m venv .venv
+source .venv/bin/activate   # Linux/macOS
+# .venv\Scripts\activate    # Windows
+
+# Or using conda
+conda create -n rapidocr python=3.10
+conda activate rapidocr
+```
+
+Install dependencies (editable install is recommended for local development):
+
+```bash
+pip install -r requirements.txt
+pip install pytest  # required to run tests
+# Optional: install the package in editable mode for debugging
+pip install -e .
+```
+
+For inference backends such as ONNX Runtime, follow the [documentation](https://rapidai.github.io/RapidOCRDocs/main/install_usage/rapidocr/install/) to install the corresponding packages (e.g. `rapidocr_onnxruntime`).
+
+---
+
+## 3. Install code formatting and pre-commit hooks
+
+Install pre-commit in your **development environment** and enable Git pre-commit hooks so that code is automatically formatted and checked (e.g. black, autoflake):
+
+```bash
+# From the python directory with your venv activated
+pip install pre-commit
+
+# Go to the repository root to install Git hooks (.pre-commit-config.yaml is in the root)
+cd ..   # if you are in python/, go back to the repo root
+pre-commit install
+```
+
+After installation, each `git commit` will run the configured checks; if they fail, the commit will be rejected. Fix the reported issues and commit again. You can also run checks manually before committing:
+
+```bash
+# From the repository root
+pre-commit run --all-files
+```
+
+---
+
+## 4. Run unit tests
+
+From the **`python`** directory:
+
+```bash
+# Run all tests
+pytest tests/ -v
+
+# Run specific test files
+pytest tests/test_input.py -v
+pytest tests/test_det_cls_rec.py -v
+
+# Run with coverage (requires pytest-cov)
+pytest tests/ -v --cov=rapidocr
+```
+
+Make sure the current main branch passes tests in your environment before making changes.
+
+---
+
+## 5. Reproduce the issue or add a new feature
+
+### Reproducing a bug
+
+1. Pick or open an issue on [GitHub Issues](https://github.com/RapidAI/RapidOCR/issues).
+2. Reproduce the problem locally using the code under the `python` directory and the issue description.
+3. Locate and fix the code in `rapidocr/` or `tests/` until the issue is resolved.
+
+### Adding a new feature
+
+1. (Optional but recommended) Discuss the requirement and approach with maintainers or in an existing issue.
+2. Implement the feature under `rapidocr/`, following the existing style (the project uses [black](https://github.com/psf/black) and related tools).
+3. Add unit tests for the new feature.
+
+---
+
+## 6. Write the corresponding unit tests
+
+- Place test files under **`python/tests/`** with names like `test_*.py`.
+- Use **pytest**. You can refer to existing tests such as `test_input.py`, `test_det_cls_rec.py`, and `test_cli.py`.
+- Put test assets (e.g. images) in `tests/test_files/`.
+- New tests should:
+    - Reliably verify the behavior you changed (bug fix or new feature).
+    - Avoid depending on external services not documented in the repo (use mocks or skip when needed).
+
+Example:
+
+```python
+# tests/test_xxx.py
+import pytest
+from pathlib import Path
+
+root_dir = Path(__file__).resolve().parent.parent
+tests_dir = root_dir / "tests" / "test_files"
+
+@pytest.fixture()
+def engine():
+    from rapidocr import RapidOCR
+    return RapidOCR()
+
+def test_your_new_feature(engine):
+    img_path = tests_dir / "ch_en_num.jpg"
+    result = engine(img_path)
+    assert result is not None
+    # more assertions...
+```
+
+---
+
+## 7. Run all unit tests
+
+From the **`python`** directory, run the full test suite again to avoid regressions:
+
+```bash
+pytest tests/ -v
+```
+
+If some tests are skipped (e.g. missing an inference engine), ensure that the tests you added or changed run and pass in your environment.
+
+---
+
+## 8. Prepare to submit to the repository
+
+### 8.1 Fork the RapidOCR repository to your account
+
+1. Open the [RapidOCR repository](https://github.com/RapidAI/RapidOCR).
+2. Click **Fork** to create a fork under your GitHub account (e.g. `https://github.com/YOUR_USERNAME/RapidOCR`).
+
+### 8.2 Commit and push to your fork
+
+If you cloned the upstream repo, add your fork as a remote and push your branch:
+
+```bash
+# Run from the repository root (RapidOCR)
+git remote add myfork https://github.com/YOUR_USERNAME/RapidOCR.git
+# If origin points to upstream, keep it; use myfork for pushing
+
+# Create a branch (one branch per issue or feature is recommended)
+git checkout -b fix/xxx   # or feat/xxx
+
+# Stage and commit your changes under python/
+git add python/
+git status   # confirm only intended files are staged
+git commit -m "fix(python): short description"
+
+# Push to your fork
+git push myfork fix/xxx
+```
+
+**Please follow the [Conventional Commits](https://www.conventionalcommits.org/) specification** for commit messages so maintainers can read and generate changelogs easily. Format:
+
+```
+<type>[optional scope]: <short description>
+
+[optional body]
+[optional footer]
+```
+
+Common types:
+
+| Type       | Description                    |
+|------------|--------------------------------|
+| `feat`     | New feature                    |
+| `fix`      | Bug fix                        |
+| `docs`     | Documentation changes          |
+| `style`    | Code style (no logic change)   |
+| `refactor` | Refactoring                    |
+| `test`     | Tests                          |
+| `chore`    | Build / tooling, etc.          |
+
+Examples: `fix(python): empty result under certain conditions`, `feat(python): support xxx input format`.
+
+### 8.3 Open a Pull Request (PR) to the main repository
+
+1. Open your fork in the browser (e.g. `https://github.com/YOUR_USERNAME/RapidOCR`).
+2. After pushing, you will usually see **Compare & pull request**; click it. Otherwise, select your branch under **Branches** and click **New pull request**.
+3. Set **base** to `RapidAI/RapidOCR` and branch `main` (or the default branch). Set **head** to your fork and your branch (e.g. `fix/xxx`).
+4. Fill in the PR title and description:
+   - **Title**: Short summary (e.g. “Fix: xxx in Python”).
+   - **Description** should include:
+     - Related issue: `Fixes #123` or `Related to #123` if applicable.
+     - Reason for the change and what was done.
+     - How to verify (e.g. “`pytest tests/ -v` in the python directory passes”).
+5. Submit the PR. After review, update your branch locally and push; the PR will update automatically.
+
+---
+
+## Summary
+
+| Step | Description |
+|------|-------------|
+| 1 | Clone the RapidOCR repository |
+| 2 | Go to the `python` directory, set up a venv, and install dependencies and pytest |
+| 3 | Install pre-commit (`pip install pre-commit`) and run `pre-commit install` from the repo root |
+| 4 | Run unit tests and confirm they pass |
+| 5 | Reproduce the issue or implement the new feature |
+| 6 | Add or update the corresponding unit tests |
+| 7 | Run the full test suite from the `python` directory and confirm it passes |
+| 8 | Fork the main repository to your account |
+| 9 | Write commits using Conventional Commits and push to your fork |
+| 10 | Open a PR from your fork’s branch to the main repository’s `main` |
+
+---
+
+## Notes
+
+- **Code style**: The project uses [black](https://github.com/psf/black), autoflake, etc. Pre-commit runs these on commit. You can also run `pre-commit run --all-files` from the repo root.
+- **Documentation**: See the [RapidOCR docs](https://rapidai.github.io/RapidOCRDocs/) for installation and usage.
+- **Issues and discussion**: Report bugs and suggest features via [GitHub Issues](https://github.com/RapidAI/RapidOCR/issues).
+
+Thank you for contributing!

+ 1 - 0
python/RapidOCR/docs/doc_whl_rapidocr.md

@@ -0,0 +1 @@
+### See [Documentation](https://rapidai.github.io/RapidOCRDocs)

+ 1 - 0
python/RapidOCR/dotnet/README.md

@@ -0,0 +1 @@
+See [RapidOCRCSharp](https://github.com/RapidAI/RapidOCRCSharp) for details.

+ 5 - 0
python/RapidOCR/ios/README.md

@@ -0,0 +1,5 @@
+
+
+需要志愿者捐赠,请直接进群联系,qq群号:887298230
+
+A contributor is wanted.

+ 3 - 0
python/RapidOCR/jvm/README.md

@@ -0,0 +1,3 @@
+See [RapidOcrNcnnJvm](https://github.com/RapidAI/RapidOcrNcnnJvm) for details.
+
+See [RapidOcrOnnxJvm](https://github.com/RapidAI/RapidOcrOnnxJvm) for details.

+ 1 - 0
python/RapidOCR/ocrweb/README.md

@@ -0,0 +1 @@
+See [RapidOCRWeb](https://github.com/RapidAI/RapidOCRWeb) for details.

+ 1 - 0
python/RapidOCR/python/README.md

@@ -0,0 +1 @@
+### See [Documentation](https://rapidai.github.io/RapidOCRDocs/main/install_usage/rapidocr/install/)

+ 12 - 0
python/RapidOCR/python/demo.py

@@ -0,0 +1,12 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from rapidocr import RapidOCR
+
+engine = RapidOCR()
+
+img_url = "https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/master/resources/test_files/ch_en_num.jpg"
+result = engine(img_url)
+print(result)
+
+result.vis("vis_result.jpg")

+ 19 - 0
python/RapidOCR/python/rapidocr/__init__.py

@@ -0,0 +1,19 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from .main import RapidOCR
+from .utils.load_image import LoadImageError
+from .utils.typings import EngineType, LangCls, LangDet, LangRec, ModelType, OCRVersion
+from .utils.vis_res import VisRes
+
+__all__ = [
+    "RapidOCR",
+    "LoadImageError",
+    "EngineType",
+    "LangCls",
+    "LangDet",
+    "LangRec",
+    "ModelType",
+    "OCRVersion",
+    "VisRes",
+]

+ 6 - 0
python/RapidOCR/python/rapidocr/cal_rec_boxes/__init__.py

@@ -0,0 +1,6 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from .main import CalRecBoxes
+
+__all__ = ["CalRecBoxes"]

+ 335 - 0
python/RapidOCR/python/rapidocr/cal_rec_boxes/main.py

@@ -0,0 +1,335 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL / Joker1212
+# @Contact: liekkaskono@163.com
+import copy
+import math
+from enum import Enum
+from typing import List, Tuple
+
+import cv2
+import numpy as np
+
+from ..ch_ppocr_rec.typings import TextRecOutput, WordInfo, WordType
+from ..utils.utils import quads_to_rect_bbox
+
+
+class Direction(Enum):
+    HORIZONTAL = "horizontal_direct"  # 水平
+    VERTICAL = "vertical_direct"  # 垂直
+
+
+class CalRecBoxes:
+    """计算识别文字的汉字单字和英文单词的坐标框。
+    代码借鉴自PaddlePaddle/PaddleOCR和fanqie03/char-detection"""
+
+    def __call__(
+        self,
+        imgs: List[np.ndarray],
+        dt_boxes: np.ndarray,
+        rec_res: TextRecOutput,
+        return_single_char_box: bool = False,
+    ) -> TextRecOutput:
+        word_results = []
+        for idx, (img, box) in enumerate(zip(imgs, dt_boxes)):
+            if rec_res.txts is None or img.size == 0 or rec_res.word_results is None:
+                continue
+
+            h, w = img.shape[:2]
+            img_box = np.array([[0, 0], [w, 0], [w, h], [0, h]])
+            word_box_content_list, word_box_list, conf_list = self.cal_ocr_word_box(
+                rec_res.txts[idx],
+                img_box,
+                rec_res.word_results[idx],
+                return_single_char_box,
+            )
+            word_box_list = self.adjust_box_overlap(copy.deepcopy(word_box_list))
+            direction = self.get_box_direction(box)
+            word_box_list = self.reverse_rotate_crop_image(
+                copy.deepcopy(box), word_box_list, direction
+            )
+            word_results.append(
+                list(zip(word_box_content_list, conf_list, word_box_list))
+            )
+
+        rec_res.word_results = tuple(word_results)
+        return rec_res
+
+    @staticmethod
+    def get_box_direction(box: np.ndarray) -> Direction:
+        edge_lengths = [
+            float(np.linalg.norm(box[0] - box[1])),  # 上边
+            float(np.linalg.norm(box[1] - box[2])),  # 右边
+            float(np.linalg.norm(box[2] - box[3])),  # 下边
+            float(np.linalg.norm(box[3] - box[0])),  # 左边
+        ]
+
+        # 宽和高取对边的最大距离
+        width = max(edge_lengths[0], edge_lengths[2])
+        height = max(edge_lengths[1], edge_lengths[3])
+
+        if width < 1e-6:
+            return Direction.VERTICAL
+
+        aspect_ratio = round(height / width, 2)
+        return Direction.VERTICAL if aspect_ratio >= 1.5 else Direction.HORIZONTAL
+
+    def cal_ocr_word_box(
+        self,
+        rec_txt: str,
+        bbox: np.ndarray,
+        word_info: WordInfo,
+        return_single_char_box: bool = False,
+    ) -> Tuple[List[str], List[List[List[float]]], List[float]]:
+        """Calculate the detection frame for each word based on the results of recognition and detection of ocr
+        汉字坐标是单字的
+        英语坐标是单词级别的
+        三种情况:
+        1. 全是汉字
+        2. 全是英文
+        3. 中英混合
+        """
+        if not rec_txt or word_info.line_txt_len == 0:
+            return [], [], []
+
+        bbox_points = quads_to_rect_bbox(bbox[None, ...])
+        avg_col_width = (bbox_points[2] - bbox_points[0]) / word_info.line_txt_len
+
+        is_all_en_num = all(v is WordType.EN_NUM for v in word_info.word_types)
+
+        line_cols, char_widths, word_contents = [], [], []
+        for word, word_col in zip(word_info.words, word_info.word_cols):
+            if is_all_en_num and not return_single_char_box:
+                line_cols.append(word_col)
+                word_contents.append("".join(word))
+            else:
+                line_cols.extend(word_col)
+                word_contents.extend(word)
+
+            if len(word_col) == 1:
+                continue
+
+            avg_width = self.calc_avg_char_width(word_col, avg_col_width)
+            char_widths.append(avg_width)
+
+        avg_char_width = self.calc_all_char_avg_width(
+            char_widths, bbox_points[0], bbox_points[2], len(rec_txt)
+        )
+
+        if is_all_en_num and not return_single_char_box:
+            word_boxes = self.calc_en_num_box(
+                line_cols, avg_char_width, avg_col_width, bbox_points
+            )
+        else:
+            word_boxes = self.calc_box(
+                line_cols, avg_char_width, avg_col_width, bbox_points
+            )
+        return word_contents, word_boxes, word_info.confs
+
+    def calc_en_num_box(
+        self,
+        line_cols: List[List[int]],
+        avg_char_width: float,
+        avg_col_width: float,
+        bbox_points: Tuple[float, float, float, float],
+    ) -> List[List[List[float]]]:
+        results = []
+        for one_col in line_cols:
+            cur_word_cell = self.calc_box(
+                one_col, avg_char_width, avg_col_width, bbox_points
+            )
+            x0, y0, x1, y1 = quads_to_rect_bbox(np.array(cur_word_cell))
+            results.append([[x0, y0], [x1, y0], [x1, y1], [x0, y1]])
+        return results
+
+    @staticmethod
+    def calc_box(
+        line_cols: List[int],
+        avg_char_width: float,
+        avg_col_width: float,
+        bbox_points: Tuple[float, float, float, float],
+    ) -> List[List[List[float]]]:
+        x0, y0, x1, y1 = bbox_points
+
+        results = []
+        for col_idx in line_cols:
+            # 将中心点定位在列的中间位置
+            center_x = (col_idx + 0.5) * avg_col_width
+
+            # 计算字符单元格的左右边界
+            char_x0 = max(int(center_x - avg_char_width / 2), 0) + x0
+            char_x1 = min(int(center_x + avg_char_width / 2), x1 - x0) + x0
+            cell = [
+                [char_x0, y0],
+                [char_x1, y0],
+                [char_x1, y1],
+                [char_x0, y1],
+            ]
+            results.append(cell)
+        return sorted(results, key=lambda x: x[0][0])
+
+    @staticmethod
+    def calc_avg_char_width(word_col: List[int], each_col_width: float) -> float:
+        char_total_length = (word_col[-1] - word_col[0]) * each_col_width
+        return char_total_length / (len(word_col) - 1)
+
+    @staticmethod
+    def calc_all_char_avg_width(
+        width_list: List[float], bbox_x0: float, bbox_x1: float, txt_len: int
+    ) -> float:
+        if txt_len == 0:
+            return 0.0
+
+        if len(width_list) > 0:
+            return sum(width_list) / len(width_list)
+
+        return (bbox_x1 - bbox_x0) / txt_len
+
+    @staticmethod
+    def adjust_box_overlap(
+        word_box_list: List[List[List[float]]],
+    ) -> List[List[List[float]]]:
+        # 调整bbox有重叠的地方
+        for i in range(len(word_box_list) - 1):
+            cur, nxt = word_box_list[i], word_box_list[i + 1]
+            if cur[1][0] > nxt[0][0]:  # 有交集
+                distance = abs(cur[1][0] - nxt[0][0])
+                cur[1][0] -= distance / 2
+                cur[2][0] -= distance / 2
+                nxt[0][0] += distance - distance / 2
+                nxt[3][0] += distance - distance / 2
+        return word_box_list
+
+    def reverse_rotate_crop_image(
+        self,
+        bbox_points: np.ndarray,
+        word_points_list: List[List[List[float]]],
+        direction: Direction,
+    ) -> List[List[List[int]]]:
+        """
+        get_rotate_crop_image的逆操作
+        img为原图
+        part_img为crop后的图
+        bbox_points为part_img中对应在原图的bbox, 四个点,左上,右上,右下,左下
+        part_points为在part_img中的点[(x, y), (x, y)]
+        """
+        left = int(np.min(bbox_points[:, 0]))
+        top = int(np.min(bbox_points[:, 1]))
+        bbox_points[:, 0] = bbox_points[:, 0] - left
+        bbox_points[:, 1] = bbox_points[:, 1] - top
+
+        img_crop_width = int(np.linalg.norm(bbox_points[0] - bbox_points[1]))
+        img_crop_height = int(np.linalg.norm(bbox_points[0] - bbox_points[3]))
+
+        pts_std = np.array(
+            [
+                [0, 0],
+                [img_crop_width, 0],
+                [img_crop_width, img_crop_height],
+                [0, img_crop_height],
+            ]
+        ).astype(np.float32)
+        M = cv2.getPerspectiveTransform(bbox_points, pts_std)
+        _, IM = cv2.invert(M)
+
+        new_word_points_list = []
+        for word_points in word_points_list:
+            new_word_points = []
+            for point in word_points:
+                new_point = point
+                if direction == Direction.VERTICAL:
+                    new_point = self.s_rotate(
+                        math.radians(-90), new_point[0], new_point[1], 0, 0
+                    )
+                    new_point[0] = new_point[0] + img_crop_width
+
+                p = np.array(new_point + [1])
+                x, y, z = np.dot(IM, p)
+                new_point = [x / z, y / z]
+
+                new_point = [int(new_point[0] + left), int(new_point[1] + top)]
+                new_word_points.append(new_point)
+            new_word_points = self.order_points(new_word_points)
+            new_word_points_list.append(new_word_points)
+        return new_word_points_list
+
+    @staticmethod
+    def s_rotate(angle, valuex, valuey, pointx, pointy):
+        """绕pointx,pointy顺时针旋转
+        https://blog.csdn.net/qq_38826019/article/details/84233397
+        """
+        valuex = np.array(valuex)
+        valuey = np.array(valuey)
+        sRotatex = (
+            (valuex - pointx) * math.cos(angle)
+            + (valuey - pointy) * math.sin(angle)
+            + pointx
+        )
+        sRotatey = (
+            (valuey - pointy) * math.cos(angle)
+            - (valuex - pointx) * math.sin(angle)
+            + pointy
+        )
+        return [sRotatex, sRotatey]
+
+    @staticmethod
+    def order_points(ori_box: List[List[int]]) -> List[List[int]]:
+        """矩形框顺序排列"""
+
+        def convert_to_1x2(p):
+            if p.shape == (2,):
+                return p.reshape((1, 2))
+
+            if p.shape == (1, 2):
+                return p
+            return p[:1, :]
+
+        box = np.array(ori_box).reshape((-1, 2))
+        center_x, center_y = np.mean(box[:, 0]), np.mean(box[:, 1])
+        if np.any(box[:, 0] == center_x) and np.any(
+            box[:, 1] == center_y
+        ):  # 有两点横坐标相等,有两点纵坐标相等,菱形
+            p1 = box[np.where(box[:, 0] == np.min(box[:, 0]))]
+            p2 = box[np.where(box[:, 1] == np.min(box[:, 1]))]
+            p3 = box[np.where(box[:, 0] == np.max(box[:, 0]))]
+            p4 = box[np.where(box[:, 1] == np.max(box[:, 1]))]
+        elif np.all(box[:, 0] == center_x):  # 四个点的横坐标都相同
+            y_sort = np.argsort(box[:, 1])
+            p1 = box[y_sort[0]]
+            p2 = box[y_sort[1]]
+            p3 = box[y_sort[2]]
+            p4 = box[y_sort[3]]
+        elif np.any(box[:, 0] == center_x) and np.all(
+            box[:, 1] != center_y
+        ):  # 只有两点横坐标相等,先上下再左右
+            p12, p34 = (
+                box[np.where(box[:, 1] < center_y)],
+                box[np.where(box[:, 1] > center_y)],
+            )
+            p1, p2 = (
+                p12[np.where(p12[:, 0] == np.min(p12[:, 0]))],
+                p12[np.where(p12[:, 0] == np.max(p12[:, 0]))],
+            )
+            p3, p4 = (
+                p34[np.where(p34[:, 0] == np.max(p34[:, 0]))],
+                p34[np.where(p34[:, 0] == np.min(p34[:, 0]))],
+            )
+        else:  # 只有两点纵坐标相等,或者是没有相等的,先左右再上下
+            p14, p23 = (
+                box[np.where(box[:, 0] < center_x)],
+                box[np.where(box[:, 0] > center_x)],
+            )
+            p1, p4 = (
+                p14[np.where(p14[:, 1] == np.min(p14[:, 1]))],
+                p14[np.where(p14[:, 1] == np.max(p14[:, 1]))],
+            )
+            p2, p3 = (
+                p23[np.where(p23[:, 1] == np.min(p23[:, 1]))],
+                p23[np.where(p23[:, 1] == np.max(p23[:, 1]))],
+            )
+
+        # 解决单字切割后横坐标完全相同的shape错误
+        p1 = convert_to_1x2(p1)
+        p2 = convert_to_1x2(p2)
+        p3 = convert_to_1x2(p3)
+        p4 = convert_to_1x2(p4)
+        return np.array([p1, p2, p3, p4]).reshape((-1, 2)).tolist()

+ 7 - 0
python/RapidOCR/python/rapidocr/ch_ppocr_cls/__init__.py

@@ -0,0 +1,7 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from .main import TextClassifier
+from .utils import TextClsOutput
+
+__all__ = ["TextClassifier", "TextClsOutput"]

+ 98 - 0
python/RapidOCR/python/rapidocr/ch_ppocr_cls/main.py

@@ -0,0 +1,98 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import copy
+import math
+import time
+from typing import Any, Dict, List, Union
+
+import cv2
+import numpy as np
+
+from rapidocr.inference_engine.base import get_engine
+
+from .utils import ClsPostProcess, TextClsOutput
+
+
+class TextClassifier:
+    def __init__(self, cfg: Dict[str, Any]):
+        self.cls_image_shape = cfg["cls_image_shape"]
+        self.cls_batch_num = cfg["cls_batch_num"]
+        self.cls_thresh = cfg["cls_thresh"]
+        self.postprocess_op = ClsPostProcess(cfg["label_list"])
+
+        self.session = get_engine(cfg.engine_type)(cfg)
+
+    def __call__(self, img_list: Union[np.ndarray, List[np.ndarray]]) -> TextClsOutput:
+        start_time = time.perf_counter()
+
+        if isinstance(img_list, np.ndarray):
+            img_list = [img_list]
+
+        img_list = copy.deepcopy(img_list)
+
+        # Calculate the aspect ratio of all text bars
+        width_list = [img.shape[1] / float(img.shape[0]) for img in img_list]
+
+        # Sorting can speed up the cls process
+        indices = np.argsort(np.array(width_list))
+
+        img_num = len(img_list)
+        cls_res = [("", 0.0)] * img_num
+        batch_num = self.cls_batch_num
+        elapse = 0
+        for beg_img_no in range(0, img_num, batch_num):
+            end_img_no = min(img_num, beg_img_no + batch_num)
+
+            norm_img_batch = []
+            for ino in range(beg_img_no, end_img_no):
+                norm_img = self.resize_norm_img(img_list[indices[ino]])
+                norm_img = norm_img[np.newaxis, :]
+                norm_img_batch.append(norm_img)
+            norm_img_batch = np.concatenate(norm_img_batch).astype(np.float32)
+
+            prob_out = self.session(norm_img_batch)
+            cls_result = self.postprocess_op(prob_out)
+
+            for rno, (label, score) in enumerate(cls_result):
+                cls_res[indices[beg_img_no + rno]] = (label, score)
+                if "180" in label and score > self.cls_thresh:
+                    img_list[indices[beg_img_no + rno]] = cv2.rotate(
+                        img_list[indices[beg_img_no + rno]], 1
+                    )
+
+        elapse = time.perf_counter() - start_time
+        return TextClsOutput(img_list=img_list, cls_res=cls_res, elapse=elapse)
+
+    def resize_norm_img(self, img: np.ndarray) -> np.ndarray:
+        img_c, img_h, img_w = self.cls_image_shape
+        h, w = img.shape[:2]
+        ratio = w / float(h)
+        if math.ceil(img_h * ratio) > img_w:
+            resized_w = img_w
+        else:
+            resized_w = int(math.ceil(img_h * ratio))
+
+        resized_image = cv2.resize(img, (resized_w, img_h))
+        resized_image = resized_image.astype("float32")
+        if img_c == 1:
+            resized_image = resized_image / 255
+            resized_image = resized_image[np.newaxis, :]
+        else:
+            resized_image = resized_image.transpose((2, 0, 1)) / 255
+
+        resized_image -= 0.5
+        resized_image /= 0.5
+        padding_im = np.zeros((img_c, img_h, img_w), dtype=np.float32)
+        padding_im[:, :, :resized_w] = resized_image
+        return padding_im

+ 79 - 0
python/RapidOCR/python/rapidocr/ch_ppocr_cls/utils.py

@@ -0,0 +1,79 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from dataclasses import dataclass
+from pathlib import Path
+from typing import List, Optional, Tuple, Union
+
+import cv2
+import numpy as np
+
+from ..utils.log import logger
+from ..utils.utils import save_img
+from ..utils.vis_res import VisRes
+
+
+@dataclass
+class TextClsOutput:
+    img_list: Optional[List[np.ndarray]] = None
+    cls_res: Optional[List[Tuple[str, float]]] = None
+    elapse: Optional[float] = None
+
+    def __len__(self):
+        if self.img_list is None:
+            return 0
+        return len(self.img_list)
+
+    def vis(self, save_path: Optional[Union[str, Path]] = None) -> Optional[np.ndarray]:
+        if self.img_list is None or self.cls_res is None:
+            logger.warning("No image or txts to visualize.")
+            return None
+
+        vis = VisRes()
+
+        txts = [f"{txt} {score:.2f}" for txt, score in self.cls_res]
+        img_degrees, scores = list(zip(*self.cls_res))
+
+        raw_img_list = self.restore_image_orientation(self.img_list, img_degrees)
+        vis_img = vis.draw_rec_res(raw_img_list, txts, list(scores))
+
+        if save_path is not None:
+            save_img(save_path, vis_img)
+            logger.info("Visualization saved as %s", save_path)
+        return vis_img
+
+    def restore_image_orientation(
+        self, img_list: List[np.ndarray], img_degrees: Tuple[str]
+    ) -> List[np.ndarray]:
+        results = []
+        for img, rotate_degree in zip(img_list, img_degrees):
+            if rotate_degree != "180":
+                results.append(img)
+                continue
+
+            rotate_img = cv2.rotate(img, cv2.ROTATE_180)
+            results.append(rotate_img)
+        return results
+
+
+class ClsPostProcess:
+    def __init__(self, label_list: List[str]):
+        self.label_list = label_list
+
+    def __call__(self, preds: np.ndarray) -> List[Tuple[str, float]]:
+        pred_idxs = preds.argmax(axis=1)
+        decode_out = [
+            (self.label_list[int(idx)], preds[i, int(idx)])
+            for i, idx in enumerate(pred_idxs)
+        ]
+        return decode_out

+ 7 - 0
python/RapidOCR/python/rapidocr/ch_ppocr_det/__init__.py

@@ -0,0 +1,7 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from .main import TextDetector
+from .utils import TextDetOutput
+
+__all__ = ["TextDetector", "TextDetOutput"]

+ 104 - 0
python/RapidOCR/python/rapidocr/ch_ppocr_det/main.py

@@ -0,0 +1,104 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+import time
+from typing import Any, Dict
+
+import numpy as np
+
+from rapidocr.inference_engine.base import get_engine
+
+from .utils import DBPostProcess, DetPreProcess, TextDetOutput
+
+_BOX_SORT_Y_THRESHOLD = 10
+
+
+class TextDetector:
+    def __init__(self, cfg: Dict[str, Any]):
+        self.limit_side_len = cfg.get("limit_side_len")
+        self.limit_type = cfg.get("limit_type")
+        self.mean = cfg.get("mean")
+        self.std = cfg.get("std")
+        self.preprocess_op = None
+
+        post_process = {
+            "thresh": cfg.get("thresh", 0.3),
+            "box_thresh": cfg.get("box_thresh", 0.5),
+            "max_candidates": cfg.get("max_candidates", 1000),
+            "unclip_ratio": cfg.get("unclip_ratio", 1.6),
+            "use_dilation": cfg.get("use_dilation", True),
+            "score_mode": cfg.get("score_mode", "fast"),
+        }
+        self.postprocess_op = DBPostProcess(**post_process)
+
+        self.session = get_engine(cfg.engine_type)(cfg)
+
+    def __call__(self, img: np.ndarray) -> TextDetOutput:
+        start_time = time.perf_counter()
+
+        if img is None:
+            raise ValueError("img is None")
+
+        ori_img_shape = img.shape[0], img.shape[1]
+        self.preprocess_op = self.get_preprocess(max(img.shape[0], img.shape[1]))
+        prepro_img = self.preprocess_op(img)
+        if prepro_img is None:
+            return TextDetOutput()
+
+        preds = self.session(prepro_img)
+        boxes, scores = self.postprocess_op(preds, ori_img_shape)
+        if len(boxes) < 1:
+            return TextDetOutput()
+
+        boxes = self.sorted_boxes(boxes)
+        elapse = time.perf_counter() - start_time
+        return TextDetOutput(img, boxes, scores, elapse=elapse)
+
+    def get_preprocess(self, max_wh: int) -> DetPreProcess:
+        if self.limit_type == "min":
+            limit_side_len = self.limit_side_len
+        elif max_wh < 960:
+            limit_side_len = 960
+        elif max_wh < 1500:
+            limit_side_len = 1500
+        else:
+            limit_side_len = 2000
+        return DetPreProcess(limit_side_len, self.limit_type, self.mean, self.std)
+
+    @staticmethod
+    def sorted_boxes(dt_boxes: np.ndarray) -> np.ndarray:
+        """
+        Equivalent NumPy implementation of the original bubble-adjusted sort.
+        """
+        if len(dt_boxes) == 0:
+            return dt_boxes
+
+        # Step 1: Stable sort by y (top to bottom)
+        y_coords = dt_boxes[:, 0, 1]
+        y_order = np.argsort(y_coords, kind="stable")
+        boxes_y_sorted = dt_boxes[y_order]
+        y_sorted = y_coords[y_order]
+
+        # Step 2: Assign line IDs based on adjacent y differences
+        dy = np.diff(y_sorted)
+        line_increments = (dy >= _BOX_SORT_Y_THRESHOLD).astype(np.int32)
+        line_ids = np.concatenate([[0], np.cumsum(line_increments)])
+
+        # Now, within each line_id group, sort by x (left to right)
+        x_coords = boxes_y_sorted[:, 0, 0]
+        final_order_in_y_sorted = np.lexsort((x_coords, line_ids))
+
+        return boxes_y_sorted[final_order_in_y_sorted]

+ 319 - 0
python/RapidOCR/python/rapidocr/ch_ppocr_det/utils.py

@@ -0,0 +1,319 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from dataclasses import dataclass
+from typing import List, Optional, Tuple
+
+import cv2
+import numpy as np
+import pyclipper
+from shapely.geometry import Polygon
+
+from ..utils.log import logger
+from ..utils.utils import save_img
+from ..utils.vis_res import VisRes
+
+
+@dataclass
+class TextDetOutput:
+    img: Optional[np.ndarray] = None
+    boxes: Optional[np.ndarray] = None
+    scores: Optional[List[float]] = None
+    elapse: float = 0.0
+
+    def __len__(self):
+        if self.boxes is None:
+            return 0
+        return len(self.boxes)
+
+    def vis(self, save_path: Optional[str] = None) -> Optional[np.ndarray]:
+        if self.img is None or self.boxes is None or self.scores is None:
+            logger.warning("No image or boxes to visualize.")
+            return None
+
+        vis = VisRes()
+        vis_img = vis.draw_dt_boxes(self.img, self.boxes, self.scores)
+
+        if save_path is not None:
+            save_img(save_path, vis_img)
+            logger.info("Visualization saved as %s", save_path)
+        return vis_img
+
+
+class DetPreProcess:
+    def __init__(
+        self, limit_side_len: int = 736, limit_type: str = "min", mean=None, std=None
+    ):
+        if mean is None:
+            mean = [0.5, 0.5, 0.5]
+
+        if std is None:
+            std = [0.5, 0.5, 0.5]
+
+        self.mean = np.array(mean)
+        self.std = np.array(std)
+        self.scale = 1 / 255.0
+
+        self.limit_side_len = limit_side_len
+        self.limit_type = limit_type
+
+    def __call__(self, img: np.ndarray) -> Optional[np.ndarray]:
+        resized_img = self.resize(img)
+        if resized_img is None:
+            return None
+
+        img = self.normalize(resized_img)
+        img = self.permute(img)
+        img = np.expand_dims(img, axis=0).astype(np.float32)
+        return img
+
+    def normalize(self, img: np.ndarray) -> np.ndarray:
+        return (img.astype("float32") * self.scale - self.mean) / self.std
+
+    def permute(self, img: np.ndarray) -> np.ndarray:
+        return img.transpose((2, 0, 1))
+
+    def resize(self, img: np.ndarray) -> Optional[np.ndarray]:
+        """resize image to a size multiple of 32 which is required by the network"""
+        h, w = img.shape[:2]
+
+        if self.limit_type == "max":
+            if max(h, w) > self.limit_side_len:
+                if h > w:
+                    ratio = float(self.limit_side_len) / h
+                else:
+                    ratio = float(self.limit_side_len) / w
+            else:
+                ratio = 1.0
+        else:
+            if min(h, w) < self.limit_side_len:
+                if h < w:
+                    ratio = float(self.limit_side_len) / h
+                else:
+                    ratio = float(self.limit_side_len) / w
+            else:
+                ratio = 1.0
+
+        resize_h = int(h * ratio)
+        resize_w = int(w * ratio)
+
+        resize_h = int(round(resize_h / 32) * 32)
+        resize_w = int(round(resize_w / 32) * 32)
+
+        try:
+            if int(resize_w) <= 0 or int(resize_h) <= 0:
+                return None
+            img = cv2.resize(img, (int(resize_w), int(resize_h)))
+        except Exception as exc:
+            raise ResizeImgError from exc
+
+        return img
+
+
+class ResizeImgError(Exception):
+    pass
+
+
+class DBPostProcess:
+    """The post process for Differentiable Binarization (DB)."""
+
+    def __init__(
+        self,
+        thresh: float = 0.3,
+        box_thresh: float = 0.7,
+        max_candidates: int = 1000,
+        unclip_ratio: float = 2.0,
+        score_mode: str = "fast",
+        use_dilation: bool = False,
+    ):
+        self.thresh = thresh
+        self.box_thresh = box_thresh
+        self.max_candidates = max_candidates
+        self.unclip_ratio = unclip_ratio
+        self.min_size = 3
+        self.score_mode = score_mode
+
+        self.dilation_kernel = None
+        if use_dilation:
+            self.dilation_kernel = np.array([[1, 1], [1, 1]])
+
+    def __call__(
+        self, pred: np.ndarray, ori_shape: Tuple[int, int]
+    ) -> Tuple[np.ndarray, List[float]]:
+        src_h, src_w = ori_shape
+        pred = pred[:, 0, :, :]
+        segmentation = pred > self.thresh
+
+        mask = segmentation[0]
+        if self.dilation_kernel is not None:
+            mask = cv2.dilate(
+                np.array(segmentation[0]).astype(np.uint8), self.dilation_kernel
+            )
+        boxes, scores = self.boxes_from_bitmap(pred[0], mask, src_w, src_h)
+        boxes, scores = self.filter_det_res(boxes, scores, src_h, src_w)
+        return boxes, scores
+
+    def boxes_from_bitmap(
+        self, pred: np.ndarray, bitmap: np.ndarray, dest_width: int, dest_height: int
+    ) -> Tuple[np.ndarray, List[float]]:
+        """
+        bitmap: single map with shape (1, H, W),
+                whose values are binarized as {0, 1}
+        """
+
+        height, width = bitmap.shape
+
+        outs = cv2.findContours(
+            (bitmap * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE
+        )
+        if len(outs) == 3:
+            img, contours, _ = outs[0], outs[1], outs[2]
+        elif len(outs) == 2:
+            contours, _ = outs[0], outs[1]
+
+        num_contours = min(len(contours), self.max_candidates)
+
+        boxes, scores = [], []
+        for index in range(num_contours):
+            contour = contours[index]
+            points, sside = self.get_mini_boxes(contour)
+            if sside < self.min_size:
+                continue
+
+            if self.score_mode == "fast":
+                score = self.box_score_fast(pred, points.reshape(-1, 2))
+            else:
+                score = self.box_score_slow(pred, contour)
+
+            if self.box_thresh > score:
+                continue
+
+            box = self.unclip(points)
+            box, sside = self.get_mini_boxes(box)
+            if sside < self.min_size + 2:
+                continue
+
+            box[:, 0] = np.clip(np.round(box[:, 0] / width * dest_width), 0, dest_width)
+            box[:, 1] = np.clip(
+                np.round(box[:, 1] / height * dest_height), 0, dest_height
+            )
+            boxes.append(box.astype(np.int32))
+            scores.append(score)
+        return np.array(boxes, dtype=np.int32), scores
+
+    def get_mini_boxes(self, contour: np.ndarray) -> Tuple[np.ndarray, float]:
+        bounding_box = cv2.minAreaRect(contour)
+        points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
+
+        index_1, index_2, index_3, index_4 = 0, 1, 2, 3
+        if points[1][1] > points[0][1]:
+            index_1 = 0
+            index_4 = 1
+        else:
+            index_1 = 1
+            index_4 = 0
+
+        if points[3][1] > points[2][1]:
+            index_2 = 2
+            index_3 = 3
+        else:
+            index_2 = 3
+            index_3 = 2
+
+        box = np.array(
+            [points[index_1], points[index_2], points[index_3], points[index_4]]
+        )
+        return box, min(bounding_box[1])
+
+    @staticmethod
+    def box_score_fast(bitmap: np.ndarray, _box: np.ndarray) -> float:
+        h, w = bitmap.shape[:2]
+        box = _box.copy()
+        xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int32), 0, w - 1)
+        xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int32), 0, w - 1)
+        ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int32), 0, h - 1)
+        ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int32), 0, h - 1)
+
+        mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
+        box[:, 0] = box[:, 0] - xmin
+        box[:, 1] = box[:, 1] - ymin
+        cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
+        return cv2.mean(bitmap[ymin : ymax + 1, xmin : xmax + 1], mask)[0]
+
+    def box_score_slow(self, bitmap: np.ndarray, contour: np.ndarray) -> float:
+        """use polyon mean score as the mean score"""
+        h, w = bitmap.shape[:2]
+        contour = contour.copy()
+        contour = np.reshape(contour, (-1, 2))
+
+        xmin = np.clip(np.min(contour[:, 0]), 0, w - 1)
+        xmax = np.clip(np.max(contour[:, 0]), 0, w - 1)
+        ymin = np.clip(np.min(contour[:, 1]), 0, h - 1)
+        ymax = np.clip(np.max(contour[:, 1]), 0, h - 1)
+
+        mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
+
+        contour[:, 0] = contour[:, 0] - xmin
+        contour[:, 1] = contour[:, 1] - ymin
+
+        cv2.fillPoly(mask, contour.reshape(1, -1, 2).astype(np.int32), 1)
+        return cv2.mean(bitmap[ymin : ymax + 1, xmin : xmax + 1], mask)[0]
+
+    def unclip(self, box: np.ndarray) -> np.ndarray:
+        unclip_ratio = self.unclip_ratio
+        poly = Polygon(box)
+        distance = poly.area * unclip_ratio / poly.length
+        offset = pyclipper.PyclipperOffset()
+        offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
+        expanded = np.array(offset.Execute(distance)).reshape((-1, 1, 2))
+        return expanded
+
+    def filter_det_res(
+        self, dt_boxes: np.ndarray, scores: List[float], img_height: int, img_width: int
+    ) -> Tuple[np.ndarray, List[float]]:
+        dt_boxes_new, new_scores = [], []
+        for box, score in zip(dt_boxes, scores):
+            box = self.order_points_clockwise(box)
+            box = self.clip_det_res(box, img_height, img_width)
+
+            rect_width = int(np.linalg.norm(box[0] - box[1]))
+            rect_height = int(np.linalg.norm(box[0] - box[3]))
+            if rect_width <= 3 or rect_height <= 3:
+                continue
+
+            dt_boxes_new.append(box)
+            new_scores.append(score)
+        return np.array(dt_boxes_new), new_scores
+
+    def order_points_clockwise(self, pts: np.ndarray) -> np.ndarray:
+        """
+        reference from:
+        https://github.com/jrosebr1/imutils/blob/master/imutils/perspective.py
+        sort the points based on their x-coordinates
+        """
+        xSorted = pts[np.argsort(pts[:, 0]), :]
+
+        # grab the left-most and right-most points from the sorted
+        # x-roodinate points
+        leftMost = xSorted[:2, :]
+        rightMost = xSorted[2:, :]
+
+        # now, sort the left-most coordinates according to their
+        # y-coordinates so we can grab the top-left and bottom-left
+        # points, respectively
+        leftMost = leftMost[np.argsort(leftMost[:, 1]), :]
+        (tl, bl) = leftMost
+
+        rightMost = rightMost[np.argsort(rightMost[:, 1]), :]
+        (tr, br) = rightMost
+
+        rect = np.array([tl, tr, br, bl], dtype="float32")
+        return rect
+
+    def clip_det_res(
+        self, points: np.ndarray, img_height: int, img_width: int
+    ) -> np.ndarray:
+        for pno in range(points.shape[0]):
+            points[pno, 0] = int(min(max(points[pno, 0], 0), img_width - 1))
+            points[pno, 1] = int(min(max(points[pno, 1], 0), img_height - 1))
+        return points

+ 7 - 0
python/RapidOCR/python/rapidocr/ch_ppocr_rec/__init__.py

@@ -0,0 +1,7 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from .main import TextRecognizer
+from .typings import TextRecInput, TextRecOutput
+
+__all__ = ["TextRecognizer", "TextRecInput", "TextRecOutput"]

+ 174 - 0
python/RapidOCR/python/rapidocr/ch_ppocr_rec/main.py

@@ -0,0 +1,174 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import math
+import time
+from pathlib import Path
+from typing import Any, Dict
+
+import cv2
+import numpy as np
+
+from rapidocr.inference_engine.base import FileInfo, get_engine
+
+from ..utils.download_file import DownloadFile, DownloadFileInput
+from ..utils.log import logger
+from ..utils.typings import LangRec
+from ..utils.utils import reorder_bidi_for_display
+from ..utils.vis_res import VisRes
+from .typings import TextRecInput, TextRecOutput
+from .utils import CTCLabelDecode
+
+DEFAULT_DICT_PATH = Path(__file__).parent.parent / "models" / "ppocr_keys_v1.txt"
+DEFAULT_DICT_URL = "https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v2.0.7/paddle/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer/ppocr_keys_v1.txt"
+DEFAULT_MODEL_PATH = Path(__file__).parent.parent / "models"
+
+
+class TextRecognizer:
+    def __init__(self, cfg: Dict[str, Any]):
+        self.session = get_engine(cfg.engine_type)(cfg)
+
+        # onnx has inner character, other engine get or download character_dict_path
+        character, character_dict_path = self.get_character_dict(cfg)
+
+        self.postprocess_op = CTCLabelDecode(
+            character=character, character_path=character_dict_path
+        )
+
+        self.rec_batch_num = cfg["rec_batch_num"]
+        self.rec_image_shape = cfg["rec_img_shape"]
+
+        self.cfg = cfg
+
+    def get_character_dict(self, cfg):
+        character = None
+        dict_path = cfg.get("rec_keys_path", None)
+        if self.session.have_key():
+            character = self.session.get_character_list()
+            return character, dict_path
+
+        # onnx has character, other engine need dict_path
+        if (not dict_path and not character) or (not Path(dict_path).exists()):
+            dict_download_url = self.session.get_dict_key_url(
+                FileInfo(
+                    engine_type=cfg.engine_type,
+                    ocr_version=cfg.ocr_version,
+                    task_type=cfg.task_type,
+                    lang_type=cfg.lang_type,
+                    model_type=cfg.model_type,
+                )
+            )
+            dict_download_url = (
+                dict_download_url if dict_download_url is not None else DEFAULT_DICT_URL
+            )
+            dict_path = DEFAULT_MODEL_PATH / Path(dict_download_url).name
+            if not Path(dict_path).exists():
+                DownloadFile.run(
+                    DownloadFileInput(
+                        file_url=dict_download_url,
+                        sha256=None,
+                        save_path=dict_path,
+                        logger=logger,
+                    )
+                )
+
+        return character, dict_path
+
+    def __call__(self, args: TextRecInput) -> TextRecOutput:
+        start_time = time.perf_counter()
+
+        img_list = [args.img] if isinstance(args.img, np.ndarray) else args.img
+        return_word_box = args.return_word_box
+
+        width_list = [img.shape[1] / float(img.shape[0]) for img in img_list]
+
+        # Sorting can speed up the recognition process
+        indices = np.argsort(np.array(width_list))
+
+        img_num = len(img_list)
+        rec_res = [("", 0.0)] * img_num
+
+        batch_num = self.rec_batch_num
+        elapse = 0
+        for beg_img_no in range(0, img_num, batch_num):
+            end_img_no = min(img_num, beg_img_no + batch_num)
+
+            # Parameter Alignment for PaddleOCR
+            imgC, imgH, imgW = self.rec_image_shape[:3]
+            max_wh_ratio = imgW / imgH
+            wh_ratio_list = []
+            for ino in range(beg_img_no, end_img_no):
+                h, w = img_list[indices[ino]].shape[0:2]
+                wh_ratio = w * 1.0 / h
+                max_wh_ratio = max(max_wh_ratio, wh_ratio)
+                wh_ratio_list.append(wh_ratio)
+
+            norm_img_batch = []
+            for ino in range(beg_img_no, end_img_no):
+                norm_img = self.resize_norm_img(img_list[indices[ino]], max_wh_ratio)
+                norm_img_batch.append(norm_img[np.newaxis, :])
+            norm_img_batch = np.concatenate(norm_img_batch).astype(np.float32)
+
+            preds = self.session(norm_img_batch)
+            line_results, word_results = self.postprocess_op(
+                preds,
+                return_word_box,
+                wh_ratio_list=wh_ratio_list,
+                max_wh_ratio=max_wh_ratio,
+            )
+
+            for rno, one_res in enumerate(line_results):
+                if return_word_box:
+                    rec_res[indices[beg_img_no + rno]] = (one_res, word_results[rno])
+                    continue
+
+                rec_res[indices[beg_img_no + rno]] = (one_res, None)
+
+        all_line_results, all_word_results = list(zip(*rec_res))
+        txts, scores = list(zip(*all_line_results))
+
+        if self.cfg.lang_type == LangRec.ARABIC:
+            txts = reorder_bidi_for_display(txts)
+
+        elapse = time.perf_counter() - start_time
+        return TextRecOutput(
+            img_list,
+            txts,
+            scores,
+            all_word_results,
+            elapse,
+            viser=VisRes(lang_type=self.cfg.lang_type, font_path=self.cfg.font_path),
+        )
+
+    def resize_norm_img(self, img: np.ndarray, max_wh_ratio: float) -> np.ndarray:
+        img_channel, img_height, img_width = self.rec_image_shape
+        assert img_channel == img.shape[2]
+
+        img_width = int(img_height * max_wh_ratio)
+
+        h, w = img.shape[:2]
+        ratio = w / float(h)
+        if math.ceil(img_height * ratio) > img_width:
+            resized_w = img_width
+        else:
+            resized_w = int(math.ceil(img_height * ratio))
+
+        resized_image = cv2.resize(img, (resized_w, img_height))
+        resized_image = resized_image.astype("float32")
+        resized_image = resized_image.transpose((2, 0, 1)) / 255
+        resized_image -= 0.5
+        resized_image /= 0.5
+
+        padding_im = np.zeros((img_channel, img_height, img_width), dtype=np.float32)
+        padding_im[:, :, 0:resized_w] = resized_image
+        return padding_im

+ 77 - 0
python/RapidOCR/python/rapidocr/ch_ppocr_rec/typings.py

@@ -0,0 +1,77 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from dataclasses import dataclass, field
+from enum import Enum
+from pathlib import Path
+from typing import List, Optional, Tuple, Union
+
+import numpy as np
+
+from ..utils.log import logger
+from ..utils.utils import save_img
+from ..utils.vis_res import VisRes
+
+
+@dataclass
+class TextRecConfig:
+    intra_op_num_threads: int = -1
+    inter_op_num_threads: int = -1
+    use_cuda: bool = False
+    use_dml: bool = False
+    model_path: Union[str, Path, None] = None
+
+    rec_batch_num: int = 6
+    rec_img_shape: Tuple[int, int, int] = (3, 48, 320)
+    rec_keys_path: Union[str, Path, None] = None
+
+
+@dataclass
+class TextRecInput:
+    img: Union[np.ndarray, List[np.ndarray], None] = None
+    return_word_box: bool = False
+
+
+@dataclass
+class TextRecOutput:
+    imgs: Optional[List[np.ndarray]] = None
+    txts: Optional[Tuple[str]] = None
+    scores: List[float] = field(default_factory=lambda: [1.0])
+    word_results: Tuple[Tuple[str, float, Optional[List[List[int]]]]] = (
+        ("", 1.0, None),
+    )
+    elapse: Optional[float] = None
+    viser: Optional[VisRes] = None
+
+    def __len__(self):
+        if self.txts is None:
+            return 0
+        return len(self.txts)
+
+    def vis(self, save_path: Optional[Union[str, Path]] = None) -> Optional[np.ndarray]:
+        if self.imgs is None or self.txts is None:
+            logger.warning("No image or txts to visualize.")
+            return None
+
+        vis_img = self.viser.draw_rec_res(self.imgs, self.txts, self.scores)
+
+        if save_path is not None:
+            save_img(save_path, vis_img)
+            logger.info("Visualization saved as %s", save_path)
+        return vis_img
+
+
+class WordType(Enum):
+    CN = "cn"
+    EN = "en"
+    NUM = "num"
+    EN_NUM = "en&num"
+
+
+@dataclass
+class WordInfo:
+    words: List[List[str]] = field(default_factory=list)
+    word_cols: List[List[int]] = field(default_factory=list)
+    word_types: List[WordType] = field(default_factory=list)
+    line_txt_len: float = 0.0
+    confs: List[float] = field(default_factory=list)

+ 186 - 0
python/RapidOCR/python/rapidocr/ch_ppocr_rec/utils.py

@@ -0,0 +1,186 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from pathlib import Path
+from typing import Any, List, Optional, Tuple, Union
+
+import numpy as np
+
+from ..utils.utils import has_chinese_char
+from .typings import WordInfo, WordType
+
+
+class CTCLabelDecode:
+    def __init__(
+        self,
+        character: Optional[List[str]] = None,
+        character_path: Union[str, Path, None] = None,
+    ):
+        self.character = self.get_character(character, character_path)
+        self.dict = {char: i for i, char in enumerate(self.character)}
+
+    def __call__(
+        self, preds: np.ndarray, return_word_box: bool = False, **kwargs
+    ) -> Tuple[List[Tuple[str, float]], List[Any]]:
+        preds_idx = preds.argmax(axis=2)
+        preds_prob = preds.max(axis=2)
+
+        wh_ratio_list = kwargs.get("wh_ratio_list", (1.0,))
+        max_wh_ratio = kwargs.get("max_wh_ratio", 1.0)
+
+        line_results, word_results = self.decode(
+            preds_idx,
+            preds_prob,
+            return_word_box,
+            wh_ratio_list,
+            max_wh_ratio,
+            remove_duplicate=True,
+        )
+        return line_results, word_results
+
+    def get_character(
+        self,
+        character: Optional[List[str]] = None,
+        character_path: Union[str, Path, None] = None,
+    ) -> List[str]:
+        if character is None and character_path is None:
+            raise ValueError("character must not be None")
+
+        character_list = None
+        if character:
+            character_list = character
+
+        if character is None and character_path is not None:
+            character_list = self.read_character_file(character_path)
+
+        if character_list is None:
+            raise ValueError("character must not be None")
+
+        character_list = self.insert_special_char(
+            character_list, " ", len(character_list)
+        )
+        character_list = self.insert_special_char(character_list, "blank", 0)
+        return character_list
+
+    @staticmethod
+    def read_character_file(character_path: Union[str, Path]) -> List[str]:
+        character_list = []
+        with open(character_path, "rb") as f:
+            lines = f.readlines()
+            for line in lines:
+                line = line.decode("utf-8").strip("\n").strip("\r\n")
+                character_list.append(line)
+        return character_list
+
+    @staticmethod
+    def insert_special_char(
+        character_list: List[str], special_char: str, loc: int = -1
+    ) -> List[str]:
+        character_list.insert(loc, special_char)
+        return character_list
+
+    def decode(
+        self,
+        text_index: np.ndarray,
+        text_prob: Optional[np.ndarray] = None,
+        return_word_box: bool = False,
+        wh_ratio_list: Tuple[float] = (1.0,),
+        max_wh_ratio: float = 1.0,
+        remove_duplicate: bool = False,
+    ) -> Tuple[List[Tuple[str, float]], List[WordInfo]]:
+        result_list, result_words_list = [], []
+        ignored_tokens = self.get_ignored_tokens()
+        batch_size = len(text_index)
+        for batch_idx in range(batch_size):
+            token_indices = text_index[batch_idx]
+
+            selection = np.ones(len(token_indices), dtype=bool)
+            if remove_duplicate:
+                selection[1:] = token_indices[1:] != token_indices[:-1]
+
+            for ignored_token in ignored_tokens:
+                selection &= token_indices != ignored_token
+
+            if text_prob is not None:
+                conf_list = np.array(text_prob[batch_idx][selection]).tolist()
+                conf_list = [round(conf, 5) for conf in conf_list]
+            else:
+                conf_list = [1] * len(selection)
+
+            if len(conf_list) == 0:
+                conf_list = [0]
+
+            char_list = [
+                self.character[text_id] for text_id in token_indices[selection]
+            ]
+            text = "".join(char_list)
+
+            result_list.append((text, np.mean(conf_list).round(5).tolist()))
+
+            if return_word_box:
+                rec_word_info = self.get_word_info(text, selection)
+                rec_word_info.line_txt_len = (
+                    len(token_indices) * wh_ratio_list[batch_idx] / max_wh_ratio
+                )
+                rec_word_info.confs = conf_list
+                result_words_list.append(rec_word_info)
+        return result_list, result_words_list
+
+    @staticmethod
+    def get_word_info(text: str, selection: np.ndarray) -> WordInfo:
+        """
+        Group the decoded characters and record the corresponding decoded positions.
+        from https://github.com/PaddlePaddle/PaddleOCR/blob/fbba2178d7093f1dffca65a5b963ec277f1a6125/ppocr/postprocess/rec_postprocess.py#L70
+        """
+        word_list = []
+        word_col_list = []
+        state_list = []
+
+        word_content = []
+        word_col_content = []
+
+        valid_col = np.where(selection)[0]
+        if len(valid_col) <= 0:
+            return WordInfo()
+
+        col_width = np.zeros(valid_col.shape)
+        col_width[1:] = valid_col[1:] - valid_col[:-1]
+        col_width[0] = min(3 if has_chinese_char(text[0]) else 2, int(valid_col[0]))
+
+        state = None
+        for c_i, char in enumerate(text):
+            if char.isspace():
+                if word_content:
+                    word_list.append(word_content)
+                    word_col_list.append(word_col_content)
+                    state_list.append(state)
+                    word_content = []
+                    word_col_content = []
+                continue
+
+            c_state = WordType.CN if has_chinese_char(char) else WordType.EN_NUM
+            if state is None:
+                state = c_state
+
+            if state != c_state or col_width[c_i] > 5:
+                if len(word_content) != 0:
+                    word_list.append(word_content)
+                    word_col_list.append(word_col_content)
+                    state_list.append(state)
+                    word_content = []
+                    word_col_content = []
+                state = c_state
+
+            word_content.append(char)
+            word_col_content.append(int(valid_col[c_i]))
+
+        if len(word_content) != 0:
+            word_list.append(word_content)
+            word_col_list.append(word_col_content)
+            state_list.append(state)
+
+        return WordInfo(words=word_list, word_cols=word_col_list, word_types=state_list)
+
+    @staticmethod
+    def get_ignored_tokens() -> List[int]:
+        return [0]  # for ctc blank

+ 26 - 0
python/RapidOCR/python/rapidocr/cli.py

@@ -0,0 +1,26 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+import shutil
+from pathlib import Path
+
+root_dir = Path(__file__).resolve().parent
+DEFAULT_CFG_PATH = root_dir / "config.yaml"
+
+
+def generate_cfg(args):
+    if args.save_cfg_file is None:
+        args.save_cfg_file = "./default_rapidocr.yaml"
+
+    shutil.copyfile(DEFAULT_CFG_PATH, args.save_cfg_file)
+    print(f"The config file has saved in {args.save_cfg_file}")
+
+
+def check_install(ocr_engine):
+    img_url = "https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.1.0/resources/test_files/ch_en_num.jpg"
+    result = ocr_engine(img_url)
+
+    if result.txts is None or result.txts[0] != "正品促销":
+        raise ValueError("The installation is incorrect!")
+
+    print("Success! rapidocr is installed correctly!")

+ 177 - 0
python/RapidOCR/python/rapidocr/config.yaml

@@ -0,0 +1,177 @@
+Global:
+    text_score: 0.5
+
+    use_det: true
+    use_cls: true
+    use_rec: true
+
+    min_height: 30
+    width_height_ratio: 8
+    max_side_len: 2000
+    min_side_len: 30
+
+    return_word_box: false
+    return_single_char_box: false
+
+    font_path: null
+    log_level: "info" # debug / info / warning / error / critical
+
+EngineConfig:
+    onnxruntime:
+        intra_op_num_threads: -1
+        inter_op_num_threads: -1
+        enable_cpu_mem_arena: false
+
+        cpu_ep_cfg:
+            arena_extend_strategy: "kSameAsRequested"
+
+        use_cuda: false
+        cuda_ep_cfg:
+            device_id: 0
+            arena_extend_strategy: "kNextPowerOfTwo"
+            cudnn_conv_algo_search: "EXHAUSTIVE"
+            do_copy_in_default_stream: true
+
+        use_dml: false
+        dm_ep_cfg: null
+
+        use_cann: false
+        cann_ep_cfg:
+            device_id: 0
+            arena_extend_strategy: "kNextPowerOfTwo"
+            npu_mem_limit:  21474836480 # 20 * 1024 * 1024 * 1024
+            op_select_impl_mode: "high_performance"
+            optypelist_for_implmode: "Gelu"
+            enable_cann_graph: true
+
+        use_coreml: false
+        coreml_ep_cfg: 
+            ModelFormat: "MLProgram"
+            MLComputeUnits: "ALL"
+            RequireStaticInputShapes: 0
+            EnableOnSubgraphs: 0
+            SpecializationStrategy: "FastPrediction"
+            ProfileComputePlan: 0
+            AllowLowPrecisionAccumulationOnGPU: 0
+            ModelCacheDirectory: "/tmp/RapidOCR"
+
+    openvino:
+        inference_num_threads: -1
+        performance_hint: null
+        performance_num_requests: -1
+        enable_cpu_pinning: null
+        num_streams: -1
+        enable_hyper_threading: null
+        scheduling_core_type: null
+
+    paddle:
+        cpu_math_library_num_threads: -1
+
+        use_npu: false
+        npu_ep_cfg:
+            device_id: 0
+            envs:
+                FLAGS_npu_jit_compile: 0
+                FLAGS_use_stride_kernel: 0
+                FLAGS_allocator_strategy: "auto_growth"
+                CUSTOM_DEVICE_BLACK_LIST: "pad3d,pad3d_grad,set_value,set_value_with_tensor"
+                FLAGS_npu_scale_aclnn: "True"
+                FLAGS_npu_split_aclnn: "True"
+
+        use_cuda: false
+        cuda_ep_cfg:
+            device_id: 0
+            gpu_mem: 500
+
+    torch:
+        use_cuda: false
+        cuda_ep_cfg:
+            device_id: 0
+
+        use_npu: false
+        npu_ep_cfg:
+            device_id: 0
+
+        use_mps: false
+
+    tensorrt:
+        device_id: 0
+        use_fp16: true
+        use_int8: false
+        workspace_size: 1073741824  # 1GB = 1 << 30
+
+        # Engine caching
+        cache_dir: null  # null = use default models dir
+        force_rebuild: false
+
+        # Dynamic shape optimization profiles
+        det_profile:
+            min_shape: [1, 3, 32, 32]
+            opt_shape: [1, 3, 736, 736]
+            max_shape: [1, 3, 2048, 2048]
+
+        rec_profile:
+            min_shape: [1, 3, 48, 32]
+            opt_shape: [6, 3, 48, 320]
+            max_shape: [6, 3, 48, 2048]
+
+        cls_profile:
+            min_shape: [1, 3, 48, 32]
+            opt_shape: [6, 3, 48, 192]
+            max_shape: [6, 3, 48, 192]
+
+    mnn: {}
+
+Det:
+    engine_type: "onnxruntime"
+    lang_type: "ch"
+    model_type: "mobile"
+    ocr_version: "PP-OCRv4"
+
+    task_type: "det"
+
+    model_path: null
+    model_dir: null
+
+    limit_side_len: 736
+    limit_type: min
+    std: [ 0.5, 0.5, 0.5 ]
+    mean: [ 0.5, 0.5, 0.5 ]
+
+    thresh: 0.3
+    box_thresh: 0.5
+    max_candidates: 1000
+    unclip_ratio: 1.6
+    use_dilation: true
+    score_mode: fast
+
+Cls:
+    engine_type: "onnxruntime"
+    lang_type: "ch"
+    model_type: "mobile"
+    ocr_version: "PP-OCRv4"
+
+    task_type: "cls"
+
+    model_path: null
+    model_dir: null
+
+    cls_image_shape: [3, 48, 192]
+    cls_batch_num: 6
+    cls_thresh: 0.9
+    label_list: ["0", "180"]
+
+Rec:
+    engine_type: "onnxruntime"
+    lang_type: "ch"
+    model_type: "mobile"
+    ocr_version: "PP-OCRv4"
+
+    task_type: "rec"
+
+    model_path: null
+    model_dir: null
+
+    rec_keys_path: null
+    rec_img_shape: [3, 48, 320]
+    rec_batch_num: 6

+ 704 - 0
python/RapidOCR/python/rapidocr/default_models.yaml

@@ -0,0 +1,704 @@
+onnxruntime:
+  PP-OCRv4:
+    det:
+      ch_PP-OCRv4_det_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/det/ch_PP-OCRv4_det_infer.onnx
+        SHA256: d2a7720d45a54257208b1e13e36a8479894cb74155a5efe29462512d42f49da9
+      ch_PP-OCRv4_det_server_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/det/ch_PP-OCRv4_det_server_infer.onnx
+        SHA256: cfa39a3f298f6d3fc71789834d15da36d11a6c59b489fc16ea4733728012f786
+      en_PP-OCRv3_det_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/det/en_PP-OCRv3_det_infer.onnx
+        SHA256: ea07c15d38ac40cd69da3c493444ec75b44ff23840553ff8ba102c1219ed39c2
+      multi_PP-OCRv3_det_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/det/Multilingual_PP-OCRv3_det_infer.onnx
+        SHA256: 5475c6c7f4d84a6c4f32241b487435d59f126a40c023387af99732258844cdc3
+    rec:
+      arabic_PP-OCRv4_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/arabic_PP-OCRv4_rec_infer.onnx
+        SHA256: 4a9011bef71687bb84288dc86ad2471bd5d37b717ddf672dd156f9e7a5601bac
+      ch_PP-OCRv4_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer.onnx
+        SHA256: 48fc40f24f6d2a207a2b1091d3437eb3cc3eb6b676dc3ef9c37384005483683b
+      ch_PP-OCRv4_rec_server_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/ch_PP-OCRv4_rec_server_infer.onnx
+        SHA256: 6a2676219be9907c7fc9cf61ebaa843bf2898777def567925b78886fcd90c07a
+      ch_doc_PP-OCRv4_rec_server_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/ch_doc_PP-OCRv4_rec_server_infer.onnx
+        SHA256: 7062242cd57ec40b6583c3e0a5d4a674accbd710eed90bad9f52b9dd341d43b3
+      chinese_cht_PP-OCRv3_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/chinese_cht_PP-OCRv3_rec_infer.onnx
+        SHA256: 779656d044ce388045e02ea9244724616194e63928606436cdfc6dc3c9528cc6
+      cyrillic_PP-OCRv3_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/cyrillic_PP-OCRv3_rec_infer.onnx
+        SHA256: 1efb65bdc460af1c0e8733d005b20952b17ca5aac10ddb56c968333791c5eaa3
+      devanagari_PP-OCRv4_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/devanagari_PP-OCRv4_rec_infer.onnx
+        SHA256: a62b6148303187907aa0b0d3a0125bdc62557d07966468cab9056949e36035e8
+      en_PP-OCRv4_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/en_PP-OCRv4_rec_infer.onnx
+        SHA256: e8770c967605983d1570cdf5352041dfb68fa0c21664f49f47b155abd3e0e318
+      japan_PP-OCRv4_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/japan_PP-OCRv4_rec_infer.onnx
+        SHA256: e1075a67dba758ecfc7ebc78a10ae61c95ac8fb66a9c86fab5541e33f085cb7a
+      ka_PP-OCRv4_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/ka_PP-OCRv4_rec_infer.onnx
+        SHA256: 9c1e186ea1d13cf6c853e57b42d382c3961fdd6acc2409e0d0dc44defc9f152b
+      korean_PP-OCRv4_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/korean_PP-OCRv4_rec_infer.onnx
+        SHA256: ab151ba9065eccd98f884cf4d927db091be86137276392072edd4f9d43ad7426
+      latin_PP-OCRv3_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/latin_PP-OCRv3_rec_infer.onnx
+        SHA256: e9d7a33667e8aaa702862975186adf2012e3f390cc0f9422865957125f8071cf
+      ta_PP-OCRv4_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/ta_PP-OCRv4_rec_infer.onnx
+        SHA256: f78d752148873c5fa6e4294002bfd162dbba54236e406a39665ebbda766161b5
+      te_PP-OCRv4_rec_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/te_PP-OCRv4_rec_infer.onnx
+        SHA256: e608c3be00c8a9ea2f5c667d90f379403e2568bd5c8183308a49ca093def8eff
+    cls:
+      ch_ppocr_mobile_v2.0_cls_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/cls/ch_ppocr_mobile_v2.0_cls_infer.onnx
+        SHA256: e47acedf663230f8863ff1ab0e64dd2d82b838fceb5957146dab185a89d6215c
+  PP-OCRv5:
+    det:
+      ch_PP-OCRv5_mobile_det.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/det/ch_PP-OCRv5_mobile_det.onnx
+        SHA256: 4d97c44a20d30a81aad087d6a396b08f786c4635742afc391f6621f5c6ae78ae
+      ch_PP-OCRv5_server_det.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/det/ch_PP-OCRv5_server_det.onnx
+        SHA256: 0f8846b1d4bba223a2a2f9d9b44022fbc22cc019051a602b41a7fda9667e4cad
+    rec:
+      ch_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/ch_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: 5825fc7ebf84ae7a412be049820b4d86d77620f204a041697b0494669b1742c5
+      ch_PP-OCRv5_rec_server_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/ch_PP-OCRv5_rec_server_infer.onnx
+        SHA256: e09385400eaaaef34ceff54aeb7c4f0f1fe014c27fa8b9905d4709b65746562a
+      korean_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/korean_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: cd6e2ea50f6943ca7271eb8c56a877a5a90720b7047fe9c41a2e541a25773c9b
+      latin_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/latin_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: b20bd37c168a570f583afbc8cd7925603890efbcdc000a59e22c269d160b5f5a
+      eslav_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/eslav_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: 08705d6721849b1347d26187f15a5e362c431963a2a62bfff4feac578c489aab
+      en_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/en_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: c3461add59bb4323ecba96a492ab75e06dda42467c9e3d0c18db5d1d21924be8
+      th_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/th_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: de541dd83161c241ff426f7ecfd602a0ba77d686cf3ab9a6c255ea82fd08006e
+      el_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/el_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: b4368bccd557123c702b7549fee6cd1e94b581337d1c9b65310f109131542b7f
+      arabic_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/arabic_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: c1192e632d0baa9146ae5b756a0e635e3dc63c1733737ebfd1629e87144e9295
+      cyrillic_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/cyrillic_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: 90f761b4bfcce0c8c561c0cb5c887b0971d3ec01c32164bdf7374a35b0982711
+      devanagari_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/devanagari_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: d6f0a906580e3fa6b324a318718f1f31f268b6ea8ef985f91c2012a37f52c91e
+      ta_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/ta_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: a42448808b7dea87597336f12438935f40353f1949e8360acd9e06b4da21bfe1
+      te_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/te_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: a3690451b50028a09a3316a1274f7c05728151ea3f8fd392696397a7fefcbd92
+openvino:
+  PP-OCRv4:
+    det:
+      ch_PP-OCRv4_det_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/det/ch_PP-OCRv4_det_infer.onnx
+        SHA256: d2a7720d45a54257208b1e13e36a8479894cb74155a5efe29462512d42f49da9
+      ch_PP-OCRv4_det_server_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/det/ch_PP-OCRv4_det_server_infer.onnx
+        SHA256: cfa39a3f298f6d3fc71789834d15da36d11a6c59b489fc16ea4733728012f786
+      en_PP-OCRv3_det_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/det/en_PP-OCRv3_det_infer.onnx
+        SHA256: ea07c15d38ac40cd69da3c493444ec75b44ff23840553ff8ba102c1219ed39c2
+      multi_PP-OCRv3_det_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/det/Multilingual_PP-OCRv3_det_infer.onnx
+        SHA256: 5475c6c7f4d84a6c4f32241b487435d59f126a40c023387af99732258844cdc3
+    rec:
+      arabic_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/arabic_PP-OCRv4_rec_infer.onnx
+        SHA256: 4a9011bef71687bb84288dc86ad2471bd5d37b717ddf672dd156f9e7a5601bac
+      ch_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer.onnx
+        SHA256: 48fc40f24f6d2a207a2b1091d3437eb3cc3eb6b676dc3ef9c37384005483683b
+      ch_PP-OCRv4_rec_server_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/ch_PP-OCRv4_rec_server_infer.onnx
+        SHA256: 6a2676219be9907c7fc9cf61ebaa843bf2898777def567925b78886fcd90c07a
+      ch_doc_PP-OCRv4_rec_server_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/ch_doc_PP-OCRv4_rec_server_infer.onnx
+        SHA256: 7062242cd57ec40b6583c3e0a5d4a674accbd710eed90bad9f52b9dd341d43b3
+      chinese_cht_PP-OCRv3_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/chinese_cht_PP-OCRv3_rec_infer.onnx
+        SHA256: 779656d044ce388045e02ea9244724616194e63928606436cdfc6dc3c9528cc6
+      cyrillic_PP-OCRv3_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/cyrillic_PP-OCRv3_rec_infer.onnx
+        SHA256: 1efb65bdc460af1c0e8733d005b20952b17ca5aac10ddb56c968333791c5eaa3
+      devanagari_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/devanagari_PP-OCRv4_rec_infer.onnx
+        SHA256: a62b6148303187907aa0b0d3a0125bdc62557d07966468cab9056949e36035e8
+      en_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/en_PP-OCRv4_rec_infer.onnx
+        SHA256: e8770c967605983d1570cdf5352041dfb68fa0c21664f49f47b155abd3e0e318
+      japan_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/japan_PP-OCRv4_rec_infer.onnx
+        SHA256: e1075a67dba758ecfc7ebc78a10ae61c95ac8fb66a9c86fab5541e33f085cb7a
+      ka_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/ka_PP-OCRv4_rec_infer.onnx
+        SHA256: 9c1e186ea1d13cf6c853e57b42d382c3961fdd6acc2409e0d0dc44defc9f152b
+      korean_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/korean_PP-OCRv4_rec_infer.onnx
+        SHA256: ab151ba9065eccd98f884cf4d927db091be86137276392072edd4f9d43ad7426
+      latin_PP-OCRv3_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/latin_PP-OCRv3_rec_infer.onnx
+        SHA256: e9d7a33667e8aaa702862975186adf2012e3f390cc0f9422865957125f8071cf
+      ta_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/ta_PP-OCRv4_rec_infer.onnx
+        SHA256: f78d752148873c5fa6e4294002bfd162dbba54236e406a39665ebbda766161b5
+      te_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/rec/te_PP-OCRv4_rec_infer.onnx
+        SHA256: e608c3be00c8a9ea2f5c667d90f379403e2568bd5c8183308a49ca093def8eff
+    cls:
+      ch_ppocr_mobile_v2.0_cls_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv4/cls/ch_ppocr_mobile_v2.0_cls_infer.onnx
+        SHA256: e47acedf663230f8863ff1ab0e64dd2d82b838fceb5957146dab185a89d6215c
+  PP-OCRv5:
+    det:
+      ch_PP-OCRv5_mobile_det.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/det/ch_PP-OCRv5_mobile_det.onnx
+        SHA256: 4d97c44a20d30a81aad087d6a396b08f786c4635742afc391f6621f5c6ae78ae
+      ch_PP-OCRv5_server_det.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/det/ch_PP-OCRv5_server_det.onnx
+        SHA256: 0f8846b1d4bba223a2a2f9d9b44022fbc22cc019051a602b41a7fda9667e4cad
+    rec:
+      ch_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/ch_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: 5825fc7ebf84ae7a412be049820b4d86d77620f204a041697b0494669b1742c5
+      ch_PP-OCRv5_rec_server_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/ch_PP-OCRv5_rec_server_infer.onnx
+        SHA256: e09385400eaaaef34ceff54aeb7c4f0f1fe014c27fa8b9905d4709b65746562a
+      korean_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/korean_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: cd6e2ea50f6943ca7271eb8c56a877a5a90720b7047fe9c41a2e541a25773c9b
+      latin_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/latin_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: b20bd37c168a570f583afbc8cd7925603890efbcdc000a59e22c269d160b5f5a
+      eslav_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/eslav_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: 08705d6721849b1347d26187f15a5e362c431963a2a62bfff4feac578c489aab
+      en_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/en_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: c3461add59bb4323ecba96a492ab75e06dda42467c9e3d0c18db5d1d21924be8
+      th_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/th_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: de541dd83161c241ff426f7ecfd602a0ba77d686cf3ab9a6c255ea82fd08006e
+      el_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/el_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: b4368bccd557123c702b7549fee6cd1e94b581337d1c9b65310f109131542b7f
+      arabic_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/arabic_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: c1192e632d0baa9146ae5b756a0e635e3dc63c1733737ebfd1629e87144e9295
+      cyrillic_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/cyrillic_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: 90f761b4bfcce0c8c561c0cb5c887b0971d3ec01c32164bdf7374a35b0982711
+      devanagari_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/devanagari_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: d6f0a906580e3fa6b324a318718f1f31f268b6ea8ef985f91c2012a37f52c91e
+      ta_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/ta_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: a42448808b7dea87597336f12438935f40353f1949e8360acd9e06b4da21bfe1
+      te_PP-OCRv5_rec_mobile_infer.onnx:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/onnx/PP-OCRv5/rec/te_PP-OCRv5_rec_mobile_infer.onnx
+        SHA256: a3690451b50028a09a3316a1274f7c05728151ea3f8fd392696397a7fefcbd92
+mnn:
+  PP-OCRv4:
+    det:
+      ch_PP-OCRv4_det_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/det/ch_PP-OCRv4_det_infer.mnn
+        SHA256: c46fbae33f0520460204c0321bfdfcf5d65c9a11b7fa750e6b86de69ef6ace34
+      ch_PP-OCRv4_det_server_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/det/ch_PP-OCRv4_det_server_infer.mnn
+        SHA256: 8618d06a704a86b1d606e538285420aafc53a4c4af27eedfa198c4f50ff638a7
+      en_PP-OCRv3_det_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/det/en_PP-OCRv3_det_infer.mnn
+        SHA256: b7147457b356e76b47ffb76483c363f567b27026a3e6bc59490240ae0e7ecf87
+      multilingual_PP-OCRv3_det_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/det/Multilingual_PP-OCRv3_det_infer.mnn
+        SHA256: 536d30c50edc1154c369af4063b2fa264e8bdada243131412e3310b035780703
+    cls:
+      ch_ppocr_mobile_v2.0_cls_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/cls/ch_ppocr_mobile_v2.0_cls_infer.mnn
+        SHA256: 6cb3cc2410929e6ca526e223a1a3e8eb7dcfeac46ed8d64a93e061a6a0eaad75
+    rec:
+      arabic_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/arabic_PP-OCRv4_rec_infer.mnn
+        SHA256: e9d9972df0f9eff2d8053b218bfe9bdcab94da49ee8e965d8df199e366f514c6
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/arabic_PP-OCRv4_rec_infer/arabic_dict.txt
+      ch_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer.mnn
+        SHA256: 8f5855347b2900ff1a8fb897bad548d7c4013f050239adccf145485accb976e5
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer/ppocr_keys_v1.txt
+      ch_PP-OCRv4_rec_server_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/ch_PP-OCRv4_rec_server_infer.mnn
+        SHA256: 836bf5cc61e9173e40a96206ae2e80b5fa68ffce2f977c8721a2882406bf6615
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer/ppocr_keys_v1.txt
+      ch_doc_PP-OCRv4_rec_server_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/ch_doc_PP-OCRv4_rec_server_infer.mnn
+        SHA256: 508bdab9c76dcf9547ac2d42ae7e734ce82182ac35dbfa591d3d8d5e81070f64
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_doc_PP-OCRv4_rec_server_infer/ppocrv4_doc_dict.txt
+      chinese_cht_PP-OCRv3_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/chinese_cht_PP-OCRv3_rec_infer.mnn
+        SHA256: 6c7307d2ef875f0510fdeb9b0c66a1d54f909363075287c04077f1f79159cae5
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/chinese_cht_PP-OCRv3_rec_infer/chinese_cht_dict.txt
+      cyrillic_PP-OCRv3_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/cyrillic_PP-OCRv3_rec_infer.mnn
+        SHA256: c80de648491badfff29b6665870ede6665d00b19760dd39d8a4b98c3b3be0534
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/cyrillic_PP-OCRv3_rec_infer/cyrillic_dict.txt
+      devanagari_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/devanagari_PP-OCRv4_rec_infer.mnn
+        SHA256: 4f8fd444ce4d128fdbe8ec65fe6e3d296f610bf0b73506db6f366c3d232135c2
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/devanagari_PP-OCRv4_rec_infer/devanagari_dict.txt
+      en_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/en_PP-OCRv4_rec_infer.mnn
+        SHA256: cbd5a57d3cb48f489644445197b11afa9679d4e239d37b21ea51ab43c1d188f4
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/en_PP-OCRv4_rec_infer/en_dict.txt
+      japan_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/japan_PP-OCRv4_rec_infer.mnn
+        SHA256: 495ebd93aec40e345f036686490384c72da068f35dc4f1a6749e2b58b90e5b10
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/japan_PP-OCRv4_rec_infer/japan_dict.txt
+      ka_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/ka_PP-OCRv4_rec_infer.mnn
+        SHA256: b2ce72ab07926b518c3b7f1f6ed13d95a8d6f0da8afd110699be6c56699eb74b
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/kannada_PP-OCRv4_rec_infer/ka_dict.txt
+      korean_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/korean_PP-OCRv4_rec_infer.mnn
+        SHA256: 813de13751c3202ade34783c10feb3ac6c9c43e7648e6b673ba783acbb389d88
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/korean_PP-OCRv4_rec_infer/korean_dict.txt
+      latin_PP-OCRv3_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/latin_PP-OCRv3_rec_infer.mnn
+        SHA256: af8ac918401e5473b14f4890f385112bd5870ac3686a2e629b1f1c9bbee56b76
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/latin_PP-OCRv3_rec_infer/latin_dict.txt
+      ta_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/ta_PP-OCRv4_rec_infer.mnn
+        SHA256: 81e3daab75a470a883b8aa8ec8cd3910c112b1f7ee2fe3b34a4cb52b23b1f187
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ta_PP-OCRv4_rec_infer/ta_dict.txt
+      te_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv4/rec/te_PP-OCRv4_rec_infer.mnn
+        SHA256: c924ff901a37945ca15b8a1ce830ea4a982bc1942f31c6e9102c559758dc7f3a
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/te_PP-OCRv4_rec_infer/te_dict.txt
+  PP-OCRv5:
+    det:
+      ch_PP-OCRv5_mobile_det:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/det/ch_PP-OCRv5_mobile_det.mnn
+        SHA256: 945745e4ee741fca2cce057c1f900132516425120b6d62e2d76a8e7fda0d8018
+      ch_PP-OCRv5_server_det:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/det/ch_PP-OCRv5_server_det.mnn
+        SHA256: bb241464b3af7229238c5b5659f8e48d7d17f18632b68d77eb381c3f47c644ef
+    rec:
+      arabic_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/arabic_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: 98f9235a90c9457db4ecae565d050c7a681dd64f7a77247221132c039c330ec8
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/arabic_PP-OCRv5_rec_mobile_infer/ppocrv5_arabic_dict.txt
+      ch_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/ch_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: 9416adfd544bd09287a13725b78cf4e4ee8ec30950b2bf99b5d7f065154496ed
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ch_PP-OCRv5_rec_mobile_infer/ppocrv5_dict.txt
+      ch_PP-OCRv5_rec_server_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/ch_PP-OCRv5_rec_server_infer.mnn
+        SHA256: c31b966c0d63ea4be8b94e59b9faf3c8b66cd1fc5010e74bd093cf831683b884
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ch_PP-OCRv5_rec_server_infer/ppocrv5_dict.txt
+      cyrillic_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/cyrillic_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: 342c4b104d9a87774f27e6b81dec5c79c212c9e4e1c2b9e99ee7997faf2b0506
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/cyrillic_PP-OCRv5_rec_mobile_infer/ppocrv5_cyrillic_dict.txt
+      devanagari_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/devanagari_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: 58b0e768777b7f396815a6732654535c524cebe1a5019846040c10a9bca3dc3e
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/devanagari_PP-OCRv5_rec_mobile_infer/ppocrv5_devanagari_dict.txt
+      el_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/el_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: a547d2860921900a14a4d79d0ea702ed233a8ded7e38176eac88c93911a203c7
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/el_PP-OCRv5_rec_mobile_infer/ppocrv5_el_dict.txt
+      en_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/en_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: f3b9c63beee5a6c823bd350943882b7edd09aa3bc610dab230aad124a36dffb5
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/en_PP-OCRv5_rec_mobile_infer/ppocrv5_en_dict.txt
+      eslav_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/eslav_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: 487a26a3c36a454be0e65ad6679c8f6526adeaf92f03945b6df8114a6db1e8c4
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/eslav_PP-OCRv5_rec_mobile_infer/ppocrv5_eslav_dict.txt
+      korean_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/korean_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: c55ecd24f787a9fdcb33c3987ba220d291370c4691503a822f742d625422a830
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/korean_PP-OCRv5_rec_mobile_infer/ppocrv5_korean_dict.txt
+      latin_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/latin_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: 6c2a566fb299f0c2e65ce6f801652871f66fe971032ec2f0de43e64bb0924cbb
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/latin_PP-OCRv5_rec_mobile_infer/ppocrv5_latin_dict.txt
+      ta_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/ta_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: 939ab21eea3ec169077f24f1fbfa3a2567d40d0fbb97b060f26fe1a7d3215c25
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ta_PP-OCRv5_rec_mobile_infer/ppocrv5_ta_dict.txt
+      te_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/te_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: 11e841cedb81872d378e1a703acdc57e821104d81c459affa4da40d11e31a1df
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/te_PP-OCRv5_rec_mobile_infer/ppocrv5_te_dict.txt
+      th_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/mnn/PP-OCRv5/rec/th_PP-OCRv5_rec_mobile_infer.mnn
+        SHA256: ec69b91c4d88d864a01d3f0acde5a4ff4a1c1d2a74deb393f356ab1122ae4a91
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/th_PP-OCRv5_rec_mobile_infer/ppocrv5_th_dict.txt
+paddle:
+  PP-OCRv4:
+    det:
+      ch_PP-OCRv4_det_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/det/ch_PP-OCRv4_det_infer
+        inference.pdiparams: 49ee815e30cff43cb1057d33bf0d94193e4d4f1ae28451cad15b40be830df915
+        inference.pdiparams.info: 4e0f29e5fa476126f9efe0dc69f7ca7be960705ad549385d30c9604193463e69
+        inference.pdmodel: 2ad68ed2768fe6c41166a5bc64680cc9f445390acb6528da449a4db2f7b90e14
+      ch_PP-OCRv4_det_server_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/det/ch_PP-OCRv4_det_server_infer
+        inference.pdiparams: 48cecf9e30e9261a634126bc5211a2736508ae89d2d6199ef2ede06cdebe678f
+        inference.pdiparams.info: 231d32f0fc329cffcd3f756ef6ab6899663222a6b655b577400a2dd9cb8d2ac8
+        inference.pdmodel: fec18291e9994c66a8c6877c7194c094071f115f7fd9f48b4da40e966749b03a
+      en_PP-OCRv3_det_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/det/en_PP-OCRv3_det_infer
+        inference.pdiparams: 83676ec730627ab4502f401410a4b6a3ce1c0bb98fa249b71db055b6bddae051
+        inference.pdiparams.info: 2fe414d9eadf914bf44e3f9ba212988a6f26f364e4f87c6d0af57438ffffc0c4
+        inference.pdmodel: c4bfb1b05d9d1d5a760801eaf6d20180ef7e47bcc675fb17d1f3a89da5fef427
+      multi_PP-OCRv3_det_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/det/Multilingual_PP-OCRv3_det_infer
+        inference.pdiparams: a6119e639cade15873ac32eb4bee121b63c811c4d007d9f618baf680aef8b857
+        inference.pdiparams.info: 2fe414d9eadf914bf44e3f9ba212988a6f26f364e4f87c6d0af57438ffffc0c4
+        inference.pdmodel: c49f664c75549678deedcaff8d11f8f2b9f1303ef6c72deccdb98a5ea2104381
+    rec:
+      arabic_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/arabic_PP-OCRv4_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/arabic_PP-OCRv4_rec_infer/arabic_dict.txt
+        inference.pdiparams: b388bf744151963a21b9ee9c60b4624ba90976ae5f1f1b023e594032b16ce40f
+        inference.pdiparams.info: 710d9f3d7c503067ae708f4bc3adc6973c0d6391adb7a1470c36eb70da6b5b83
+        inference.pdmodel: c9a2f2e222937a934c5c23fd95dbba5b62a3f6d822a14f86cf8b3e9b93632c64
+      ch_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer/ppocr_keys_v1.txt
+        inference.pdiparams: a6dbfa63e7ee161688523c954e9e293f77dc24044db81e836ff9c7f103fd191a
+        inference.pdiparams.info: 0706cc669bdce382f582c139906076b170b112c51bf32573fe43438025777f12
+        inference.pdmodel: bf78f3898a004615e69c676259d8171bd7ae99000653b41fd72aaa36ae8bd304
+      ch_PP-OCRv4_rec_server_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_PP-OCRv4_rec_server_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer/ppocr_keys_v1.txt
+        inference.pdiparams: 0319d16cb29b01788757164083da6512a5defb9bccdff5148a929bd43b2cd58f
+        inference.pdiparams.info: 3aad501d0aa2bb2785b90e184c8c5d21718a72c5b896e4493856eb8936ab8455
+        inference.pdmodel: d3cbe31c26670a275c74bba052a49925e24452243f6f646a9ef92c4f5acffd37
+      ch_doc_PP-OCRv4_rec_server_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_doc_PP-OCRv4_rec_server_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_doc_PP-OCRv4_rec_server_infer/ppocrv4_doc_dict.txt
+        inference.pdiparams: 4ad3cf3f353ae3353515266c5e96ef49e9de27dd6cbe413b0234192384203bab
+        inference.json: ad20b17e9306f1f6ca00a875dfd5206837513b53d5052fe4b56050468f4751b8
+        inference.yml: 05cfc785baf5108048843cf88e25c200f06f81a19a484d82dc76ee9be16b9a03
+      chinese_cht_PP-OCRv3_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/chinese_cht_PP-OCRv3_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/chinese_cht_PP-OCRv3_rec_infer/chinese_cht_dict.txt
+        inference.pdiparams: 53cdd14ef365d43a3de5357dea00935c56ae6d8f0f7d464b8aee187df8bdea47
+        inference.pdiparams.info: cf79f0b9689b4d6b8094d8bfe2481dc4b4d1699adb622568384695b5f56dc600
+        inference.pdmodel: 343da1722eae396309e365f9a98bca53f55ca71cf757743abaf83dce9d805bc0
+      cyrillic_PP-OCRv3_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/cyrillic_PP-OCRv3_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/cyrillic_PP-OCRv3_rec_infer/cyrillic_dict.txt
+        inference.pdiparams: 2ef815afbb8970610618561946ce86faf60745ada64cd316ed34bfe34bdbf46f
+        inference.pdiparams.info: cf79f0b9689b4d6b8094d8bfe2481dc4b4d1699adb622568384695b5f56dc600
+        inference.pdmodel: 377bee205d0ba2d401cb463c300366ab87b2373b45dd9c8e3b560166b064beee
+      devanagari_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/devanagari_PP-OCRv4_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/devanagari_PP-OCRv4_rec_infer/devanagari_dict.txt
+        inference.pdiparams: 3effa6613d040b79aac36699a3c97fe835fc91791db2010d614cc9ffb08ee415
+        inference.pdiparams.info: 710d9f3d7c503067ae708f4bc3adc6973c0d6391adb7a1470c36eb70da6b5b83
+        inference.pdmodel: 854c6d41d186be07588f3cd4ced044270ad96afd7f31ee4f63a70e1d3714a9d1
+      en_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/en_PP-OCRv4_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/en_PP-OCRv4_rec_infer/en_dict.txt
+        inference.pdiparams: 75f64a1ffb70c56b7a25655963ca16f5bf3286202e3f52ac972bee05cdee2f56
+        inference.pdiparams.info: 710d9f3d7c503067ae708f4bc3adc6973c0d6391adb7a1470c36eb70da6b5b83
+        inference.pdmodel: 85b952f05f709af259cfe4254012aa7208bef0998f71f57a15495446f25ccd43
+      japan_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/japan_PP-OCRv4_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/japan_PP-OCRv4_rec_infer/japan_dict.txt
+        inference.pdiparams: e6e9b7395bb8476e85a15776aa09445c483778c06a512921cbc9ba81c6a2719a
+        inference.pdiparams.info: 710d9f3d7c503067ae708f4bc3adc6973c0d6391adb7a1470c36eb70da6b5b83
+        inference.pdmodel: d3339a79fa88863f639a00cd57e973cc278fa46c3eb369b88cb24bb06c96b89d
+      ka_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/kannada_PP-OCRv4_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/kannada_PP-OCRv4_rec_infer/ka_dict.txt
+        inference.pdiparams: 580fe5bc0a45956569bc63bda99b27f33505f20e519fa43517d1d5b9bcbb029b
+        inference.pdiparams.info: 710d9f3d7c503067ae708f4bc3adc6973c0d6391adb7a1470c36eb70da6b5b83
+        inference.pdmodel: bb23a7045f9f03c414f727c75752660d51760f02da2023e258bcc014f585f8e8
+      korean_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/korean_PP-OCRv4_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/korean_PP-OCRv4_rec_infer/korean_dict.txt
+        inference.pdiparams: ea4f4762582336e9b85653b016c5b1091223c947fafd282161fa3145ecb72d1d
+        inference.pdiparams.info: 533380ead7fe161fc32ac0bdc966e65c18c0d9d1a16088736c4aa3c628701374
+        inference.pdmodel: 992eed58af98ed6125065027e52fbc69b570e6c2e74610ec240e8a2436151ada
+      latin_PP-OCRv3_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/latin_PP-OCRv3_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/latin_PP-OCRv3_rec_infer/latin_dict.txt
+        inference.pdiparams: fa6f22a8b2a669c7d5b4dfd2607491abd4cd2467adf7077c9cc70ce8c47b61d9
+        inference.pdiparams.info: cf79f0b9689b4d6b8094d8bfe2481dc4b4d1699adb622568384695b5f56dc600
+        inference.pdmodel: 23e11a537d8cd7ebab669a9e48fb12f49c295ca9ecde398c6b5460830bd174b8
+      ta_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ta_PP-OCRv4_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ta_PP-OCRv4_rec_infer/ta_dict.txt
+        inference.pdiparams: b0ae170098a2f706de3c60496f5730b5ea4afe1d6e679a67ab6f89aa1018ffde
+        inference.pdiparams.info: 533380ead7fe161fc32ac0bdc966e65c18c0d9d1a16088736c4aa3c628701374
+        inference.pdmodel: 42ddd121b43ab9ef399e59705bcde6c56891a531c6df87a467aa99c5db54926f
+      te_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/te_PP-OCRv4_rec_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/te_PP-OCRv4_rec_infer/te_dict.txt
+        inference.pdiparams: 6efc7c74d4ff86b4187ece2f3b4f6ecd775c3a780b82d27d39a9c506d6e9eb42
+        inference.pdiparams.info: 533380ead7fe161fc32ac0bdc966e65c18c0d9d1a16088736c4aa3c628701374
+        inference.pdmodel: f07501e6a3e96aa065ed5a656c8aee5bff9694ac8cb51826178dd4f34e884333
+    cls:
+      ch_ppocr_mobile_v2.0_cls_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/cls/ch_ppocr_mobile_v2_cls_infer
+        inference.pdiparams: d1efda1b80e174b4fcb168a035ac96c1af4938892bd86a55f300a6027105d08c
+        inference.pdiparams.info: ee0c2eaa4c09814802bf520c01dfdbc1345dc2879dc9e67424d32c7b0ee88e59
+        inference.pdmodel: 3c4337ec61722a20b1dca2e5bfaffc313c0592bc89ad6e0d45168224186f6683
+  PP-OCRv5:
+    det:
+      ch_PP-OCRv5_mobile_det:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/det/ch_PP-OCRv5_mobile_det
+        inference.pdiparams: afa1820cb16c1fd0dad589d0f8b389139061c1ef6d68019685fd07be997dda5b
+        inference.json: 05feef1acb00aa4cd7362b15f7f501fc4f99d7b1fa73c1c871e0c7b1504b0f5c
+        inference.yml: 98069072e1b6b37d727fd9d9f11725faa46d6ea0de012f2ed26caea011c37699
+      ch_PP-OCRv5_server_det:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/det/ch_PP-OCRv5_server_det
+        inference.pdiparams: 183146fe9d9910352f68482f623bcbbb9fa7b9e8fa1463b9ad288cef00524d2d
+        inference.json: af5876933d8806a1b50d895867e0781e135cd92ff37381992828fc8d1b842d28
+        inference.yml: 28fb721efc3634fc8aa677e474b9602cb815a91cf569ef357a7a553d7b3ce685
+    rec:
+      ch_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ch_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ch_PP-OCRv5_rec_mobile_infer/ppocrv5_dict.txt
+        inference.pdiparams: 2460da90875937c94db97eba74ae3d9e5d4c4c57c42f1f41531c09a26bcc771a
+        inference.json: 24587345250c7332d0fc6f9a44e794d078cdaeb64c302fef906f325619de2569
+        inference.yml: 5dfeb2777f6d0db8177d8128a8acfcf6e6276dc4ac73ea3bf0dc06d6a5e85d8e
+      ch_PP-OCRv5_rec_server_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ch_PP-OCRv5_rec_server_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ch_PP-OCRv5_rec_server_infer/ppocrv5_dict.txt
+        inference.pdiparams: 63853f062a5f4089befc16f565a68277618e0da5cb45468b49d11079de0ada77
+        inference.json: 8e6e12e5d42531840310977fffb58165bf889fc5061408c5a8afdb6985f47fcb
+        inference.yml: 2c719dba044c4e2228aef8ff92f5f575394d75d24c16de096a33b7cfd902f66d
+      korean_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/korean_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/korean_PP-OCRv5_rec_mobile_infer/ppocrv5_korean_dict.txt
+        inference.pdiparams: cac3e5f12cf04aaa77f6a5bc704e4e736ef2908476551891d84b41b4e9090462
+        inference.json: 562404e3c590c50c93778d5f0a94df21b47b5ab8f3ea6d47c7f8a7930c3bc844
+        inference.yml: f757fa1c40e99edcf27e9cce879b93eb2a51fa46f5ef39095689b8c37dd75998
+      latin_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/latin_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/latin_PP-OCRv5_rec_mobile_infer/ppocrv5_latin_dict.txt
+        inference.pdiparams: ecebeaac46267ecaa9f21025ac3ba8972a93c4a906c505ceb8a525299e466e0a
+        inference.json: ea8f315e69b96e33a1339dde868651eadb4aaa62ce92e70c546f4b25e43134e7
+        inference.yml: 13cfe9c251d13aaabc619c135ff1724444d0a9e247bd8e0b2f4a76298c87eba0
+      eslav_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/eslav_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/eslav_PP-OCRv5_rec_mobile_infer/ppocrv5_eslav_dict.txt
+        inference.pdiparams: f11057b05d8517868bca505271278973d706600d9dcc184cbcf5c4512091c32b
+        inference.json: 3fb6e2e658f5139ff16e35260de8f0577f106a9505c902e1dfc1f4f1d03cc9cb
+        inference.yml: 025039bac23eb4a308efcefa4d58eab3af440767815c6ba6938468bf6353ee5a
+      en_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/en_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/en_PP-OCRv5_rec_mobile_infer/ppocrv5_en_dict.txt
+        inference.pdiparams: 3ec8a97ed6cefe8568d3e2ee90bb193299b566a7661aa4fd52d224b96b59f66b
+        inference.json: fd1b6ec722ea841a72d3ba43e527df1d1066d5d7808e0503ee3eec7265188753
+        inference.yml: 27e91d0582f40168aa218303c76e184bc78fa7a5d105aad0cfbad8458b441067
+      th_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/th_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/th_PP-OCRv5_rec_mobile_infer/ppocrv5_th_dict.txt
+        inference.pdiparams: 45ec91f2322b58b8d30ba27d18fcfdbb8bf388b918dd978162d2af91e0c66d4b
+        inference.json: b92a9442af5ac6da3c6ba30ce1672d00a2d7a0fbd8068ed543ac6dcb9d5c7764
+        inference.yml: f6ba7fefc38ca1ff398ddafa75d67d16e0b3757c4e6c833adffee98a981766c9
+      el_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/el_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/el_PP-OCRv5_rec_mobile_infer/ppocrv5_el_dict.txt
+        inference.pdiparams: 4d69bbb8ed9f84373631d121ab459f8583cd978df813b5fb9b139b7783b05fbd
+        inference.json: 7fb97d32c52562deb0a0654e0839acbb54cacc0441a857cc1d66e12892337e41
+        inference.yml: 17d85b2fe2d2f24cd4ab07bcbc33e0c126859b956ced36e281dc65e2d0c1f0bf
+      arabic_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/arabic_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/arabic_PP-OCRv5_rec_mobile_infer/ppocrv5_arabic_dict.txt
+        inference.json: d43832804d89c0e18d59f8446a3babd0ef60c44bc335efddac3176caf71d50af
+        inference.pdiparams: 4b4271fd1dd89a40b2056e1a42e58de7c0df2fdcf1a97ca9a8916d7ec45f9143
+        inference.yml: 21368419e6c016c31db55d316d59e11c128e1913e6e6fe10287084710043d3a6
+      cyrillic_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/cyrillic_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/cyrillic_PP-OCRv5_rec_mobile_infer/ppocrv5_cyrillic_dict.txt
+        inference.json: 5d90f1bfca52d80c01de176c5238fae2459995a99ff1dbfe5319ab4ed1735df2
+        inference.pdiparams: 434dc9fa2a99fa3653e08f8cf793ae56be7dd41c35c4980e6255147cc02bbc80
+        inference.yml: 5c76cc91fa98410178a09f498db10050d0ec1634a660053d3005ab7be581f501
+      devanagari_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/devanagari_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/devanagari_PP-OCRv5_rec_mobile_infer/ppocrv5_devanagari_dict.txt
+        inference.json: 20d3b121184ec6b7de1705015022f4b22117990f6e5728a5d21c314bac9c1444
+        inference.pdiparams: 719be7d20bfe9530e2deae324c999e9911087496bce5e70846767c448d023a01
+        inference.yml: 9bd172dd26440c8ce94d1cde5d5baea6aefdc7cf3c5c8492e0beedef656d4e54
+      ta_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ta_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ta_PP-OCRv5_rec_mobile_infer/ppocrv5_ta_dict.txt
+        inference.json: 9f845b15eef3272a47201fa0e411f8e466392b7dcfd705ade2890bf434b70370
+        inference.pdiparams: 7e7bb6c9cbcc401aeaceb0cdace485dbfcc15a410f299294e7c3123bbcbc68f0
+        inference.yml: 88a28f5a1bb30cabe38a0985cb5e6619fa4f0c7c78e57a08274674228c5219a6
+      te_PP-OCRv5_rec_mobile_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/te_PP-OCRv5_rec_mobile_infer
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/te_PP-OCRv5_rec_mobile_infer/ppocrv5_te_dict.txt
+        inference.json: ad866e22872fbc0ee498b2c47d9537bd8f49f125353e43d82a4d173b595e6bfb
+        inference.pdiparams: 45967d00d6b4af590221733bf0d93791babc1feb17b98da401dba53d3cf110c9
+        inference.yml: acebbe53f1831bf28ddfed75aedf58225d7aa5d09100c1d5a9140a2a53b137ce
+torch:
+  PP-OCRv4:
+    det:
+      ch_PP-OCRv4_det_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/det/ch_PP-OCRv4_det_infer.pth
+        SHA256: 89622c3f3e76b3ac7d10d9434c1f117a7471dba44723885cc04b49932a740d5b
+      ch_PP-OCRv4_det_server_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/det/ch_PP-OCRv4_det_server_infer.pth
+        SHA256: 2ccb02359eb1e56c72ba3cc21d166ed32a517863554f19fc6ec1b663fe4a6652
+      en_PP-OCRv3_det_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/det/en_PP-OCRv3_det_infer.pth
+        SHA256: 62aba369c8245f131bb08348a7d9c3135234d5c139ec92b34c8ac7b9ba7c2846
+      multi_PP-OCRv3_det_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/det/Multilingual_PP-OCRv3_det_infer.pth
+        SHA256: 05eb1c89030b269b830ba7f2d424a4ac80c7593ea1795fef9777fedbc18e383f
+    rec:
+      arabic_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/arabic_PP-OCRv4_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/arabic_PP-OCRv4_rec_infer/arabic_dict.txt
+        SHA256: 4dced07241dba41911aca9b5b2c3505e4c041d836d7225700007981375d4f387
+      ch_PP-OCRv4_rec_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer/ppocr_keys_v1.txt
+        SHA256: cb4265bb4300a2487e93e82ccfa1924bf9cd1194c1a202ab17a96b4911c27e0b
+      ch_PP-OCRv4_rec_server_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/ch_PP-OCRv4_rec_server_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ch_PP-OCRv4_rec_infer/ppocr_keys_v1.txt
+        SHA256: 2c0c9f5180ae3e4d8ea9d3830116ac49900abbb2af3985db02c2bbf484bb0bf9
+      chinese_cht_PP-OCRv3_rec_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/chinese_cht_PP-OCRv3_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/chinese_cht_PP-OCRv3_rec_infer/chinese_cht_dict.txt
+        SHA256: e1a636246ca348d481161fb4e2d3dc4aea4223e71fc50b0e72984ebaa322cbb4
+      cyrillic_PP-OCRv3_rec_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/cyrillic_PP-OCRv3_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/cyrillic_PP-OCRv3_rec_infer/cyrillic_dict.txt
+        SHA256: 77d16690e6b3f16c2c907a7dcfe1c3722169fd22913fcfe84ee54069b8e5798d
+      devanagari_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/devanagari_PP-OCRv4_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/devanagari_PP-OCRv4_rec_infer/devanagari_dict.txt
+        SHA256: 423c6d25e3c0db42192fdb2e87af37fed3e772f2433fefad0e13572b87788ad7
+      en_PP-OCRv4_rec_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/en_PP-OCRv4_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/en_PP-OCRv4_rec_infer/en_dict.txt
+        SHA256: f9723c94847de59df9b059faaed041fbb35f014ea973da40290b1e97004e1d4e
+      japan_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/japan_PP-OCRv4_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/japan_PP-OCRv4_rec_infer/japan_dict.txt
+        SHA256: de4549a131821f7456a16175e6aea85c4fda5a6cec10b119c3c2d54c02ceb06f
+      ka_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/ka_PP-OCRv4_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/kannada_PP-OCRv4_rec_infer/ka_dict.txt
+        SHA256: 4a537f8aa90afb4f3bb63d0950c2d408b18d586509956b4f56652ef0829764f3
+      korean_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/korean_PP-OCRv4_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/korean_PP-OCRv4_rec_infer/korean_dict.txt
+        SHA256: f50614ab06416037454ac2cc6ade9a758f0222f518254fd57dd65f428f763a01
+      latin_PP-OCRv3_rec_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/latin_PP-OCRv3_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/latin_PP-OCRv3_rec_infer/latin_dict.txt
+        SHA256: caf8e0f2572a7dea2d901c7f50bd78fc310a4246a655d062b69f30258c15bf90
+      ta_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/ta_PP-OCRv4_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/ta_PP-OCRv4_rec_infer/ta_dict.txt
+        SHA256: 16f42da51ddbad9acbb2a0372ddc2432c49ba3d1ccabc98acec55a4684b6a644
+      te_PP-OCRv4_rec_infer:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/rec/te_PP-OCRv4_rec_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv4/rec/te_PP-OCRv4_rec_infer/te_dict.txt
+        SHA256: 0597e97b27be9298d703267ed28227abfa6c0a60c07817575b5fa7d2124e7dbc
+    cls:
+      ch_ptocr_mobile_v2.0_cls_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/cls/ch_ptocr_mobile_v2.0_cls_infer.pth
+        SHA256: bfe13860824b3365c0c7f7ccfcddc8ff11645c60051739ff18bc9913f60c98e1
+  PP-OCRv5:
+    det:
+      ch_PP-OCRv5_det_mobile_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv5/det/ch_PP-OCRv5_det_mobile_infer.pth
+        SHA256: df848ed5060bac4d0f6e58572aea97d92e909a8a87cf292849237b0e84f6ffdb
+      ch_PP-OCRv5_det_server_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv5/det/ch_PP-OCRv5_det_server_infer.pth
+        SHA256: 607403c8233149f61c39f8e3302270625fab9fa2852c4329fc3c8f49cdb41e70
+    rec:
+      ch_PP-OCRv5_rec_mobile_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv5/rec/ch_PP-OCRv5_rec_mobile_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ch_PP-OCRv5_rec_mobile_infer/ppocrv5_dict.txt
+        SHA256: d20ee8dac2ca63e2d1989b02ecc42595c71d61bf8dd8c8ddc5ad2ee68e7b5be2
+      ch_PP-OCRv5_rec_server_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv5/rec/ch_PP-OCRv5_rec_server_infer.pth
+        dict_url: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/paddle/PP-OCRv5/rec/ch_PP-OCRv5_rec_server_infer/ppocrv5_dict.txt
+        SHA256: 4767ddc90c1532ec01d881a980dae0a0b92679f4f82f88c4e9f92563de69e740
+    cls:
+      ch_ptocr_mobile_v2.0_cls_infer.pth:
+        model_dir: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/torch/PP-OCRv4/cls/ch_ptocr_mobile_v2.0_cls_infer.pth
+        SHA256: bfe13860824b3365c0c7f7ccfcddc8ff11645c60051739ff18bc9913f60c98e1
+fonts:
+  ch:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/FZYTK.TTF
+    SHA256: 4065a23df6823c8e2b69a0e76d02f02a6470b8774a5e91086609701ad95cc33f
+  ch_doc:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/FZYTK.TTF
+    SHA256: 4065a23df6823c8e2b69a0e76d02f02a6470b8774a5e91086609701ad95cc33f
+  en:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/FZYTK.TTF
+    SHA256: 4065a23df6823c8e2b69a0e76d02f02a6470b8774a5e91086609701ad95cc33f
+  japan:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/japan.ttc
+    SHA256: 11122490a5e3a862015c8894183750de59abf95c3936d63d5978293d92f23dba
+  arabic:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/arabic.ttf
+    SHA256: e38162048fca91187a7c3d73a4a1109ca0490d6bac8acf3ff32a57f5fff7e5b6
+  chinese_cht:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/chinese_cht.ttf
+    SHA256: 5ce814960d0cdea1dd647180636babc1cf6a0acf0a9a9019424f4689acedd9ea
+  cyrillic:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/cyrillic.ttf
+    SHA256: 084768d29859a62b735387fb5946dfe61fb3d844031c7c51c1668d8afbe3b802
+  eslav:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/cyrillic.ttf
+    SHA256: 084768d29859a62b735387fb5946dfe61fb3d844031c7c51c1668d8afbe3b802
+  devanagari:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/devanagari_Martel-Regular.ttf
+    SHA256: b37a04d22ee4700f5018d39678ebb2b9fc64c9d06cd9ec6f9f3fb4c27870a63b
+  korean:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/korean.ttf
+    SHA256: 0897316bdb2e308cea2841c54940f2ef5707856000aa07910c8bff39a47e40bd
+  ka:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/kannada.ttf
+    SHA256: b337386a8e853ccba53c0c248bd06f025d7667b800ba74c72c66040d67315c6e
+  latin:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/latin.ttf
+    SHA256: 1562fe5cbdaacab4a5880d6404ba05245d12f3a4478fe16021e976bc725ce5d5
+  ta:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/tamil.ttf
+    SHA256: b771ac413157f6b1f1a52fb8ff1b56057f4b492fcce385ddd32ca12eee0c73b0
+  te:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/telugu.ttf
+    SHA256: 7f82ab141b77d263f9ea9b31b47faf50c11310f42fce6d9dffeaaa334909bbf9
+  th:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/th.ttf
+    SHA256: b8150084e25734e6f31696c57ff009f5564efa09d295848b717d9e2328c0311d
+  el:
+    path: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.7.0/resources/fonts/el.ttf
+    SHA256: 892e6c81071fc85923f8d0ff925edab6049dd4495a2fdedffddc04753bf348b3

+ 3 - 0
python/RapidOCR/python/rapidocr/inference_engine/__init__.py

@@ -0,0 +1,3 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com

+ 162 - 0
python/RapidOCR/python/rapidocr/inference_engine/base.py

@@ -0,0 +1,162 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+import abc
+from dataclasses import dataclass
+from enum import Enum
+from pathlib import Path
+from typing import Dict, List, Union
+
+import numpy as np
+from omegaconf import OmegaConf
+
+from ..utils.log import logger
+from ..utils.typings import EngineType, ModelType, OCRVersion, TaskType
+from ..utils.utils import import_package
+
+cur_dir = Path(__file__).resolve().parent.parent
+MODEL_URL_PATH = cur_dir / "default_models.yaml"
+
+
+def get_engine(engine_type: EngineType):
+    logger.info("Using engine_name: %s", engine_type.value)
+
+    if engine_type == EngineType.ONNXRUNTIME:
+        if not import_package(engine_type.value):
+            raise ImportError(f"{engine_type.value} is not installed.")
+
+        from .onnxruntime import OrtInferSession
+
+        return OrtInferSession
+
+    if engine_type == EngineType.OPENVINO:
+        if not import_package(engine_type.value):
+            raise ImportError(f"{engine_type.value} is not installed")
+
+        from .openvino import OpenVINOInferSession
+
+        return OpenVINOInferSession
+
+    if engine_type == EngineType.PADDLE:
+        if not import_package(engine_type.value):
+            raise ImportError(f"{engine_type.value} is not installed")
+
+        from .paddle import PaddleInferSession
+
+        return PaddleInferSession
+
+    if engine_type == EngineType.TORCH:
+        if not import_package(engine_type.value):
+            raise ImportError(f"{engine_type.value} is not installed")
+
+        from .pytorch import TorchInferSession
+
+        return TorchInferSession
+
+    if engine_type == EngineType.TENSORRT:
+        if not import_package("tensorrt"):
+            raise ImportError("tensorrt is not installed")
+
+        from .tensorrt import TRTInferSession
+
+        return TRTInferSession
+
+    if engine_type == EngineType.MNN:
+        if not import_package("MNN"):
+            raise ImportError("MNN is not installed")
+
+        from .mnn import MNNInferSession
+
+        return MNNInferSession
+
+    raise ValueError(f"Unsupported engine: {engine_type.value}")
+
+
+@dataclass
+class FileInfo:
+    engine_type: EngineType
+    ocr_version: OCRVersion
+    task_type: TaskType
+    lang_type: Enum
+    model_type: ModelType
+
+
+class InferSession(abc.ABC):
+    model_info = OmegaConf.load(MODEL_URL_PATH)
+    DEFAULT_MODEL_PATH = cur_dir / "models"
+
+    @abc.abstractmethod
+    def __init__(self, config):
+        pass
+
+    @abc.abstractmethod
+    def __call__(self, input_content: np.ndarray) -> np.ndarray:
+        pass
+
+    @staticmethod
+    def _verify_model(model_path: Union[str, Path, None]):
+        if model_path is None:
+            raise ValueError("model_path is None!")
+
+        model_path = Path(model_path)
+        if not model_path.exists():
+            raise FileNotFoundError(f"{model_path} does not exists.")
+
+        if not model_path.is_file():
+            raise FileExistsError(f"{model_path} is not a file.")
+
+    @abc.abstractmethod
+    def have_key(self, key: str = "character") -> bool:
+        pass
+
+    @abc.abstractmethod
+    def get_character_list(self, key: str = "character") -> List[str]:
+        pass
+
+    @classmethod
+    def get_model_url(cls, file_info: FileInfo) -> Dict[str, str]:
+        engine_type = file_info.engine_type.value
+        ocr_version = file_info.ocr_version.value
+        task_type = file_info.task_type.value
+        lang_type = file_info.lang_type.value
+        model_type = file_info.model_type.value
+
+        model_dict = OmegaConf.select(
+            cls.model_info, f"{engine_type}.{ocr_version}.{task_type}"
+        )
+
+        if not model_dict:
+            raise ValueError(
+                f"Unsupported configuration: {engine_type}.{ocr_version}.{task_type}.{model_type}"
+            )
+
+        # 优先查找 server 模型
+        if model_type == ModelType.SERVER.value:
+            for k in model_dict:
+                if k.startswith(lang_type) and model_type in k:
+                    return model_dict[k]
+
+        for k in model_dict:
+            if k.startswith(lang_type):
+                return model_dict[k]
+
+        logger.error(
+            "Unsupported configuration:\n"
+            f"  engine_type   = {engine_type}\n"
+            f"  ocr_version   = {ocr_version}\n"
+            f"  task_type     = {task_type}\n"
+            f"  lang_type     = {lang_type}\n"
+            "\n"
+            "Please refer to the official model list for supported combinations:\n"
+            "https://rapidai.github.io/RapidOCRDocs/main/model_list/\n"
+            "\n"
+            "Example valid usage:\n"
+            "  from rapidocr import LangRec, OCRVersion, RapidOCR\n"
+            "  engine = RapidOCR(params={'Rec.ocr_version': OCRVersion.PPOCRV5, 'Rec.lang_type': LangRec.CH})",
+        )
+        raise ValueError("Invalid OCR configuration.")
+
+    @classmethod
+    def get_dict_key_url(cls, file_info: FileInfo) -> str:
+        model_dict = cls.get_model_url(file_info)
+        return model_dict["dict_url"]

+ 6 - 0
python/RapidOCR/python/rapidocr/inference_engine/mnn/__init__.py

@@ -0,0 +1,6 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from .main import MNNError, MNNInferSession
+
+__all__ = ["MNNError", "MNNInferSession"]

+ 83 - 0
python/RapidOCR/python/rapidocr/inference_engine/mnn/main.py

@@ -0,0 +1,83 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+import traceback
+from pathlib import Path
+from typing import List
+
+import MNN
+import numpy as np
+from omegaconf import DictConfig
+
+from ...utils.download_file import DownloadFile, DownloadFileInput
+from ...utils.log import logger
+from ..base import FileInfo, InferSession
+
+
+class MNNInferSession(InferSession):
+    def __init__(self, cfg: DictConfig):
+        super().__init__(cfg)
+
+        model_path = cfg.get("model_path", None)
+        if model_path is None:
+            model_info = self.get_model_url(
+                FileInfo(
+                    engine_type=cfg.engine_type,
+                    ocr_version=cfg.ocr_version,
+                    task_type=cfg.task_type,
+                    lang_type=cfg.lang_type,
+                    model_type=cfg.model_type,
+                )
+            )
+            model_path = self.DEFAULT_MODEL_PATH / Path(model_info["model_dir"]).name
+            DownloadFile.run(
+                DownloadFileInput(
+                    file_url=model_info["model_dir"],
+                    sha256=model_info["SHA256"],
+                    save_path=model_path,
+                    logger=logger,
+                )
+            )
+
+        model_path = Path(model_path)
+        self._verify_model(model_path)
+        self.interpreter = MNN.Interpreter(str(model_path))
+        self.session = self.interpreter.createSession()
+        self.input_tensor = self.interpreter.getSessionInput(self.session)
+
+    def __call__(self, img: np.ndarray) -> np.ndarray:
+        try:
+            self.interpreter.resizeTensor(self.input_tensor, img.shape)
+            self.interpreter.resizeSession(self.session)
+
+            tmp = MNN.Tensor(
+                img.shape,
+                MNN.Halide_Type_Float,
+                img,
+                MNN.Tensor_DimensionType_Caffe,
+            )
+            self.input_tensor.copyFromHostTensor(tmp)
+            self.interpreter.runSession(self.session)
+
+            output = self.interpreter.getSessionOutput(self.session)
+            out_shape = output.getShape()
+            out_tensor = MNN.Tensor(
+                out_shape, MNN.Halide_Type_Float, MNN.Tensor_DimensionType_Caffe
+            )
+            output.copyToHostTensor(out_tensor)
+
+            return np.array(out_tensor.getData()).reshape(out_shape)
+
+        except Exception as e:
+            error_info = traceback.format_exc()
+            raise MNNError(f"MNN inference failed: {error_info}") from e
+
+    def have_key(self, key: str = "character") -> bool:
+        return False
+
+    def get_character_list(self, key: str = "character") -> List[str]:
+        return []
+
+
+class MNNError(Exception):
+    pass

+ 6 - 0
python/RapidOCR/python/rapidocr/inference_engine/onnxruntime/__init__.py

@@ -0,0 +1,6 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from .main import ONNXRuntimeError, OrtInferSession
+
+__all__ = ["ONNXRuntimeError", "OrtInferSession"]

+ 111 - 0
python/RapidOCR/python/rapidocr/inference_engine/onnxruntime/main.py

@@ -0,0 +1,111 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+import os
+import traceback
+from pathlib import Path
+from typing import Any, Dict, List
+
+import numpy as np
+from onnxruntime import GraphOptimizationLevel, InferenceSession, SessionOptions
+
+from ...utils.download_file import DownloadFile, DownloadFileInput
+from ...utils.log import logger
+from ..base import FileInfo, InferSession
+from .provider_config import ProviderConfig
+
+
+class OrtInferSession(InferSession):
+    def __init__(self, cfg: Dict[str, Any]):
+        # support custom session (PR #451)
+        session = cfg.get("session", None)
+        if session is not None:
+            if not isinstance(session, InferenceSession):
+                raise TypeError(
+                    f"Expected session to be an InferenceSession, got {type(session)}"
+                )
+
+            logger.debug("Using the provided InferenceSession for inference.")
+            self.session = session
+            return
+
+        model_path = cfg.get("model_path", None)
+        if model_path is None:
+            # 说明用户没有指定自己模型,使用默认模型
+            model_info = self.get_model_url(
+                FileInfo(
+                    engine_type=cfg.engine_type,
+                    ocr_version=cfg.ocr_version,
+                    task_type=cfg.task_type,
+                    lang_type=cfg.lang_type,
+                    model_type=cfg.model_type,
+                )
+            )
+            model_path = self.DEFAULT_MODEL_PATH / Path(model_info["model_dir"]).name
+            download_params = DownloadFileInput(
+                file_url=model_info["model_dir"],
+                sha256=model_info["SHA256"],
+                save_path=model_path,
+                logger=logger,
+            )
+            DownloadFile.run(download_params)
+
+        logger.info(f"Using {model_path}")
+        model_path = Path(model_path)
+        self._verify_model(model_path)
+
+        sess_opt = self._init_sess_opts(cfg.engine_cfg)
+
+        provider_cfg = ProviderConfig(engine_cfg=cfg.engine_cfg)
+        self.session = InferenceSession(
+            str(model_path),
+            sess_options=sess_opt,
+            providers=provider_cfg.get_ep_list(),
+        )
+        provider_cfg.verify_providers(self.session.get_providers())
+
+    @staticmethod
+    def _init_sess_opts(cfg: Dict[str, Any]) -> SessionOptions:
+        sess_opt = SessionOptions()
+        sess_opt.log_severity_level = 4
+        sess_opt.enable_cpu_mem_arena = cfg.enable_cpu_mem_arena
+        sess_opt.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL
+
+        cpu_nums = os.cpu_count()
+        intra_op_num_threads = cfg.get("intra_op_num_threads", -1)
+        if intra_op_num_threads != -1 and 1 <= intra_op_num_threads <= cpu_nums:
+            sess_opt.intra_op_num_threads = intra_op_num_threads
+
+        inter_op_num_threads = cfg.get("inter_op_num_threads", -1)
+        if inter_op_num_threads != -1 and 1 <= inter_op_num_threads <= cpu_nums:
+            sess_opt.inter_op_num_threads = inter_op_num_threads
+
+        return sess_opt
+
+    def __call__(self, input_content: np.ndarray) -> np.ndarray:
+        input_dict = dict(zip(self.get_input_names(), [input_content]))
+        try:
+            return self.session.run(self.get_output_names(), input_dict)[0]
+        except Exception as e:
+            error_info = traceback.format_exc()
+            raise ONNXRuntimeError(error_info) from e
+
+    def get_input_names(self) -> List[str]:
+        return [v.name for v in self.session.get_inputs()]
+
+    def get_output_names(self) -> List[str]:
+        return [v.name for v in self.session.get_outputs()]
+
+    def get_character_list(self, key: str = "character") -> List[str]:
+        meta_dict = self.session.get_modelmeta().custom_metadata_map
+        return meta_dict[key].splitlines()
+
+    def have_key(self, key: str = "character") -> bool:
+        meta_dict = self.session.get_modelmeta().custom_metadata_map
+        if key in meta_dict.keys():
+            return True
+        return False
+
+
+class ONNXRuntimeError(Exception):
+    pass

+ 203 - 0
python/RapidOCR/python/rapidocr/inference_engine/onnxruntime/provider_config.py

@@ -0,0 +1,203 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+import platform
+from enum import Enum
+from typing import Any, Dict, List, Sequence, Tuple
+
+from onnxruntime import get_available_providers, get_device
+
+from ...utils.log import logger
+
+
+class EP(Enum):
+    CPU_EP = "CPUExecutionProvider"
+    CUDA_EP = "CUDAExecutionProvider"
+    DIRECTML_EP = "DmlExecutionProvider"
+    CANN_EP = "CANNExecutionProvider"
+    COREML_EP = "CoreMLExecutionProvider"
+
+
+class ProviderConfig:
+    def __init__(self, engine_cfg: Dict[str, Any]):
+        self.had_providers: List[str] = get_available_providers()
+        self.default_provider = self.had_providers[0]
+
+        self.cfg_use_cuda = engine_cfg.get("use_cuda", False)
+        self.cfg_use_dml = engine_cfg.get("use_dml", False)
+        self.cfg_use_cann = engine_cfg.get("use_cann", False)
+        self.cfg_use_coreml = engine_cfg.get("use_coreml", False)
+
+        self.cfg = engine_cfg
+
+    def get_ep_list(self) -> List[Tuple[str, Dict[str, Any]]]:
+        results = [(EP.CPU_EP.value, self.cpu_ep_cfg())]
+
+        if self.is_cuda_available():
+            results.insert(0, (EP.CUDA_EP.value, self.cuda_ep_cfg()))
+
+        if self.is_dml_available():
+            logger.info(
+                "Windows 10 or above detected, try to use DirectML as primary provider"
+            )
+            results.insert(0, (EP.DIRECTML_EP.value, self.dml_ep_cfg()))
+
+        if self.is_cann_available():
+            logger.info("Try to use CANNExecutionProvider to infer")
+            results.insert(0, (EP.CANN_EP.value, self.cann_ep_cfg()))
+
+        if self.is_coreml_available():
+            logger.info("macOS/iOS detected, try to use CoreML as primary provider")
+            results.insert(0, (EP.COREML_EP.value, self.coreml_ep_cfg()))
+
+        return results
+
+    def cpu_ep_cfg(self) -> Dict[str, Any]:
+        return dict(self.cfg.cpu_ep_cfg)
+
+    def cuda_ep_cfg(self) -> Dict[str, Any]:
+        return dict(self.cfg.cuda_ep_cfg)
+
+    def dml_ep_cfg(self) -> Dict[str, Any]:
+        if self.cfg.dm_ep_cfg is not None:
+            return self.cfg.dm_ep_cfg
+
+        if self.is_cuda_available():
+            return self.cuda_ep_cfg()
+        return self.cpu_ep_cfg()
+
+    def cann_ep_cfg(self) -> Dict[str, Any]:
+        return dict(self.cfg.cann_ep_cfg)
+
+    def coreml_ep_cfg(self) -> Dict[str, Any]:
+        return dict(self.cfg.coreml_ep_cfg)
+
+    def verify_providers(self, session_providers: Sequence[str]):
+        if not session_providers:
+            raise ValueError("Session Providers is empty")
+
+        first_provider = session_providers[0]
+
+        providers_to_check = {
+            EP.CUDA_EP: self.is_cuda_available,
+            EP.DIRECTML_EP: self.is_dml_available,
+            EP.CANN_EP: self.is_cann_available,
+            EP.COREML_EP: self.is_coreml_available,
+        }
+
+        for ep, check_func in providers_to_check.items():
+            if check_func() and first_provider != ep.value:
+                logger.warning(
+                    f"{ep.value} is available, but the inference part is automatically shifted to be executed under {first_provider}. "
+                )
+                logger.warning(f"The available lists are {session_providers}")
+
+    def is_cuda_available(self) -> bool:
+        if not self.cfg_use_cuda:
+            return False
+
+        CUDA_EP = EP.CUDA_EP.value
+        if get_device() == "GPU" and CUDA_EP in self.had_providers:
+            return True
+
+        logger.warning(
+            f"{CUDA_EP} is not in available providers ({self.had_providers}). Use {self.default_provider} inference by default."
+        )
+        install_instructions = [
+            f"If you want to use {CUDA_EP} acceleration, you must do:"
+            "(For reference only) If you want to use GPU acceleration, you must do:",
+            "First, uninstall all onnxruntime packages in current environment.",
+            "Second, install onnxruntime-gpu by `pip install onnxruntime-gpu`.",
+            "Note the onnxruntime-gpu version must match your cuda and cudnn version.",
+            "You can refer this link: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html",
+            f"Third, ensure {CUDA_EP} is in available providers list. e.g. ['CUDAExecutionProvider', 'CPUExecutionProvider']",
+        ]
+        self.print_log(install_instructions)
+        return False
+
+    def is_dml_available(self) -> bool:
+        if not self.cfg_use_dml:
+            return False
+
+        cur_os = platform.system()
+        if cur_os != "Windows":
+            logger.warning(
+                f"DirectML is only supported in Windows OS. The current OS is {cur_os}. Use {self.default_provider} inference by default.",
+            )
+            return False
+
+        window_build_number_str = platform.version().split(".")[-1]
+        window_build_number = (
+            int(window_build_number_str) if window_build_number_str.isdigit() else 0
+        )
+        if window_build_number < 18362:
+            logger.warning(
+                f"DirectML is only supported in Windows 10 Build 18362 and above OS. The current Windows Build is {window_build_number}. Use {self.default_provider} inference by default.",
+            )
+            return False
+
+        DML_EP = EP.DIRECTML_EP.value
+        if DML_EP in self.had_providers:
+            return True
+
+        logger.warning(
+            f"{DML_EP} is not in available providers ({self.had_providers}). Use {self.default_provider} inference by default."
+        )
+        install_instructions = [
+            "If you want to use DirectML acceleration, you must do:",
+            "First, uninstall all onnxruntime packages in current environment.",
+            "Second, install onnxruntime-directml by `pip install onnxruntime-directml`",
+            f"Third, ensure {DML_EP} is in available providers list. e.g. ['DmlExecutionProvider', 'CPUExecutionProvider']",
+        ]
+        self.print_log(install_instructions)
+        return False
+
+    def is_cann_available(self) -> bool:
+        if not self.cfg_use_cann:
+            return False
+
+        CANN_EP = EP.CANN_EP.value
+        if CANN_EP in self.had_providers:
+            return True
+
+        logger.warning(
+            f"{CANN_EP} is not in available providers ({self.had_providers}). Use {self.default_provider} inference by default."
+        )
+        install_instructions = [
+            "If you want to use CANN acceleration, you must do:",
+            "First, ensure you have installed Huawei Ascend software stack.",
+            "Second, install onnxruntime with CANN support by following the instructions at:",
+            "\thttps://onnxruntime.ai/docs/execution-providers/community-maintained/CANN-ExecutionProvider.html",
+            f"Third, ensure {CANN_EP} is in available providers list. e.g. ['CANNExecutionProvider', 'CPUExecutionProvider']",
+        ]
+        self.print_log(install_instructions)
+        return False
+
+    def is_coreml_available(self) -> bool:
+        if not self.cfg_use_coreml:
+            return False
+
+        cur_os = platform.system()
+        if cur_os != "Darwin":
+            logger.warning(
+                f"CoreML is only supported in macOS/iOS. The current OS is {cur_os}. Use {self.default_provider} inference by default.",
+            )
+            return False
+
+        COREML_EP = EP.COREML_EP.value
+        if COREML_EP in self.had_providers:
+            return True
+
+        logger.warning(
+            f"{COREML_EP} is not in available providers ({self.had_providers}). Use {self.default_provider} inference by default."
+        )
+        install_instructions = [
+            "The standard onnxruntime package for macOS includes CoreML support.",
+            f"Ensure {COREML_EP} is in available providers list. e.g. ['CoreMLExecutionProvider', 'CPUExecutionProvider']",
+        ]
+        self.print_log(install_instructions)
+        return False
+
+    def print_log(self, log_list: List[str]):
+        for log_info in log_list:
+            logger.info(log_info)

+ 6 - 0
python/RapidOCR/python/rapidocr/inference_engine/openvino/__init__.py

@@ -0,0 +1,6 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from .main import OpenVINOError, OpenVINOInferSession
+
+__all__ = ["OpenVINOError", "OpenVINOInferSession"]

+ 51 - 0
python/RapidOCR/python/rapidocr/inference_engine/openvino/device_config.py

@@ -0,0 +1,51 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+import os
+from typing import Any, Dict
+
+from omegaconf import DictConfig
+
+from ...utils.log import logger
+
+
+class CPUConfig:
+    """Configuration handler for OpenVINO CPU execution."""
+
+    def __init__(self, engine_cfg: DictConfig):
+        self.cfg = engine_cfg
+
+    def get_config(self) -> Dict[str, Any]:
+        """Build OpenVINO CPU configuration dictionary."""
+        config = {}
+
+        infer_num_threads = self.cfg.get("inference_num_threads", -1)
+        if infer_num_threads != -1 and 1 <= infer_num_threads <= os.cpu_count():
+            config["INFERENCE_NUM_THREADS"] = str(infer_num_threads)
+
+        performance_hint = self.cfg.get("performance_hint", None)
+        if performance_hint is not None:
+            config["PERFORMANCE_HINT"] = str(performance_hint)
+
+        performance_num_requests = self.cfg.get("performance_num_requests", -1)
+        if performance_num_requests != -1:
+            config["PERFORMANCE_HINT_NUM_REQUESTS"] = str(performance_num_requests)
+
+        enable_cpu_pinning = self.cfg.get("enable_cpu_pinning", None)
+        if enable_cpu_pinning is not None:
+            config["ENABLE_CPU_PINNING"] = str(enable_cpu_pinning)
+
+        num_streams = self.cfg.get("num_streams", -1)
+        if num_streams != -1:
+            config["NUM_STREAMS"] = str(num_streams)
+
+        enable_hyper_threading = self.cfg.get("enable_hyper_threading", None)
+        if enable_hyper_threading is not None:
+            config["ENABLE_HYPER_THREADING"] = str(enable_hyper_threading)
+
+        scheduling_core_type = self.cfg.get("scheduling_core_type", None)
+        if scheduling_core_type is not None:
+            config["SCHEDULING_CORE_TYPE"] = str(scheduling_core_type)
+
+        logger.info(f"Using OpenVINO config: {config}")
+        return config

+ 103 - 0
python/RapidOCR/python/rapidocr/inference_engine/openvino/main.py

@@ -0,0 +1,103 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+import traceback
+from pathlib import Path
+from typing import List
+
+import numpy as np
+from omegaconf import DictConfig
+
+try:
+    from openvino import Core
+except ImportError:
+    from openvino.runtime import Core
+
+from ...utils.download_file import DownloadFile, DownloadFileInput
+from ...utils.log import logger
+from ..base import FileInfo, InferSession
+from .device_config import CPUConfig
+
+
+class OpenVINOInferSession(InferSession):
+    def __init__(self, cfg: DictConfig):
+        super().__init__(cfg)
+
+        core = Core()
+
+        model_path = cfg.get("model_path", None)
+        if model_path is None:
+            model_info = self.get_model_url(
+                FileInfo(
+                    engine_type=cfg.engine_type,
+                    ocr_version=cfg.ocr_version,
+                    task_type=cfg.task_type,
+                    lang_type=cfg.lang_type,
+                    model_type=cfg.model_type,
+                )
+            )
+            model_path = self.DEFAULT_MODEL_PATH / Path(model_info["model_dir"]).name
+            download_params = DownloadFileInput(
+                file_url=model_info["model_dir"],
+                sha256=model_info["SHA256"],
+                save_path=model_path,
+                logger=logger,
+            )
+            DownloadFile.run(download_params)
+
+        logger.info(f"Using {model_path}")
+        model_path = Path(model_path)
+        self._verify_model(model_path)
+
+        # Use dedicated config class
+        cpu_config = CPUConfig(cfg.get("engine_cfg", {}))
+        core.set_property("CPU", cpu_config.get_config())
+
+        self.model = core.read_model(model_path)
+        compile_model = core.compile_model(model=self.model, device_name="CPU")
+        self.session = compile_model.create_infer_request()
+
+    def __call__(self, input_content: np.ndarray) -> np.ndarray:
+        try:
+            self.session.infer(inputs=[input_content])
+            return self.session.get_output_tensor().data
+        except Exception as e:
+            error_info = traceback.format_exc()
+            raise OpenVINOError(error_info) from e
+
+    def have_key(self, key: str = "character") -> bool:
+        try:
+            self.get_character_list(key)
+            return True
+        except OpenVINOError:
+            return False
+
+    def get_character_list(self, key: str = "character") -> List[str]:
+        framework_info = self.get_rt_info_framework()
+        if framework_info is None:
+            raise OpenVINOError(f"Failed to get runtime framework info")
+
+        if key not in framework_info:
+            raise OpenVINOError(f"Key '{key}' not found in framework info")
+
+        val = framework_info[key]
+        if not hasattr(val, "value"):
+            raise OpenVINOError(
+                f"Invalid value object for key '{key}': missing 'value' attribute"
+            )
+
+        value = getattr(val, "value", None)
+        if value is None:
+            raise OpenVINOError(f"Value is None for key '{key}'")
+
+        return value.splitlines()
+
+    def get_rt_info_framework(self):
+        rt_info = self.model.get_rt_info()
+        if "framework" not in rt_info:
+            return None
+        return rt_info["framework"]
+
+
+class OpenVINOError(Exception):
+    pass

+ 6 - 0
python/RapidOCR/python/rapidocr/inference_engine/paddle/__init__.py

@@ -0,0 +1,6 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from .main import PaddleInferError, PaddleInferSession
+
+__all__ = ["PaddleInferError", "PaddleInferSession"]

+ 120 - 0
python/RapidOCR/python/rapidocr/inference_engine/paddle/device_config.py

@@ -0,0 +1,120 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+import os
+import platform
+
+import paddle
+
+from ...utils.log import logger
+from ...utils.typings import OCRVersion
+
+
+class DeviceConfig:
+    def __init__(self, cfg, infer_opts, ocr_version):
+        self.use_cuda = cfg.engine_cfg.use_cuda
+        self.use_npu = cfg.engine_cfg.use_npu
+
+        self.cfg = cfg
+        self.infer_opts = infer_opts
+        self.ocr_version = ocr_version
+
+    def setup_device(self):
+        if self.use_cuda:
+            self.config_cuda(self.cfg.engine_cfg.cuda_ep_cfg)
+            return self.infer_opts
+
+        if self.use_npu:
+            self.config_npu(self.cfg.engine_cfg.npu_ep_cfg)
+            return self.infer_opts
+
+        self.config_cpu()
+        return self.infer_opts
+
+    def config_cuda(self, ep_cfg):
+        if not self.check_cuda():
+            raise DeviceConfigError("CUDA is not available.")
+
+        gpu_id = self.get_infer_gpuid()
+        if gpu_id is None:
+            raise DeviceConfigError(
+                "CUDA is not found in current device by nvidia-smi. Please check your device or ignore it if run on jetson."
+            )
+
+        self.infer_opts.enable_use_gpu(ep_cfg.gpu_mem, ep_cfg.device_id)
+        logger.info(f"Using CUDA device with ID: {ep_cfg.device_id}")
+
+    def config_npu(self, ep_cfg):
+        self.setup_device_envs(ep_cfg.envs)
+
+        npu_id = ep_cfg.device_id
+        self.infer_opts.enable_custom_device("npu", npu_id)
+        logger.info(f"Using NPU device with ID: {npu_id}")
+
+    def config_cpu(self):
+        self.infer_opts.disable_gpu()
+        if hasattr(self.infer_opts, "disable_mkldnn"):
+            self.infer_opts.disable_mkldnn()
+        logger.info("Using CPU device")
+
+        cpu_nums = os.cpu_count()
+        infer_num_threads = self.cfg.get("cpu_math_library_num_threads", -1)
+        if infer_num_threads != -1 and 1 <= infer_num_threads <= cpu_nums:
+            self.infer_opts.set_cpu_math_library_num_threads(infer_num_threads)
+
+            logger.info(f"Set CPU math library threads to: {infer_num_threads}")
+
+        if self.ocr_version == OCRVersion.PPOCRV5:
+            if hasattr(self.infer_opts, "enable_new_ir"):
+                self.infer_opts.enable_new_ir(True)
+
+            if hasattr(self.infer_opts, "enable_new_executor"):
+                self.infer_opts.enable_new_executor()
+            if hasattr(self.infer_opts, "set_optimization_level"):
+                self.infer_opts.set_optimization_level(3)
+
+    @staticmethod
+    def setup_device_envs(envs):
+        for key, val in envs.items():
+            os.environ[key] = str(val)
+            logger.info(f"{key} has been set to {val}.")
+
+    @staticmethod
+    def check_cuda() -> bool:
+        if paddle.is_compiled_with_cuda():
+            device = paddle.get_device()
+            if device.startswith("gpu"):
+                logger.info(f"GPU is available. Current device: {device}")
+                return True
+
+            logger.warning(
+                "PaddlePaddle was compiled with CUDA support, but no GPU is currently available."
+            )
+            return False
+
+        logger.warning(
+            "PaddlePaddle was not compiled with CUDA support (CPU-only version)."
+        )
+        return False
+
+    @staticmethod
+    def get_infer_gpuid() -> int:
+        sysstr = platform.system()
+        if sysstr == "Windows":
+            return 0
+
+        if not paddle.device.is_compiled_with_rocm:
+            cmd = "env | grep CUDA_VISIBLE_DEVICES"
+        else:
+            cmd = "env | grep HIP_VISIBLE_DEVICES"
+        env_cuda = os.popen(cmd).readlines()
+
+        if len(env_cuda) == 0:
+            return 0
+
+        gpu_id = env_cuda[0].strip().split("=")[1]
+        return int(gpu_id[0])
+
+
+class DeviceConfigError(Exception):
+    pass

+ 150 - 0
python/RapidOCR/python/rapidocr/inference_engine/paddle/main.py

@@ -0,0 +1,150 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from pathlib import Path
+from typing import List, Optional, Tuple
+
+import numpy as np
+from omegaconf.errors import ConfigKeyError
+from paddle import inference
+
+from ...utils.download_file import DownloadFile, DownloadFileInput
+from ...utils.log import logger
+from ...utils.typings import OCRVersion
+from ..base import FileInfo, InferSession
+from .device_config import DeviceConfig
+
+
+class PaddleInferSession(InferSession):
+    def __init__(self, cfg, mode: Optional[str] = None) -> None:
+        self.mode = mode
+
+        pdmodel_path, pdiparams_path = self.setup_model(cfg)
+        infer_opts = inference.Config(str(pdmodel_path), str(pdiparams_path))
+        infer_opts = DeviceConfig(cfg, infer_opts, cfg.ocr_version).setup_device()
+
+        self.predictor = self.init_predictor(infer_opts, cfg.ocr_version)
+
+    def __call__(self, img: np.ndarray):
+        input_tensor = self.get_input_tensors()
+        output_tensors = self.get_output_tensors()
+
+        input_tensor.copy_from_cpu(img)
+        self.predictor.run()
+
+        outputs = []
+        for output_tensor in output_tensors:
+            output = output_tensor.copy_to_cpu()
+            outputs.append(output)
+
+        self.predictor.try_shrink_memory()
+        return outputs[0]
+
+    def get_input_tensors(self):
+        input_names = self.predictor.get_input_names()
+        for name in input_names:
+            input_tensor = self.predictor.get_input_handle(name)
+        return input_tensor
+
+    def get_output_tensors(self):
+        output_names = self.predictor.get_output_names()
+        if self.mode == "rec":
+            output_name = "softmax_0.tmp_0"
+            if output_name in output_names:
+                return [self.predictor.get_output_handle(output_name)]
+
+        output_tensors = []
+        for output_name in output_names:
+            output_tensor = self.predictor.get_output_handle(output_name)
+            output_tensors.append(output_tensor)
+        return output_tensors
+
+    def setup_model(self, cfg) -> Tuple[Path, Path]:
+        pdmodel_name = "inference.json"
+        pdmodel_name_v2 = "inference.pdmodel"
+        pdiparams_name = "inference.pdiparams"
+
+        model_dir = cfg.get("model_dir", None)
+        if model_dir is None:
+            model_info = self.get_model_url(
+                FileInfo(
+                    engine_type=cfg.engine_type,
+                    ocr_version=cfg.ocr_version,
+                    task_type=cfg.task_type,
+                    lang_type=cfg.lang_type,
+                    model_type=cfg.model_type,
+                )
+            )
+            default_model_dir = model_info["model_dir"]
+
+            try:
+                pdmodel_path = self.download_model(
+                    model_info, default_model_dir, pdmodel_name
+                )
+            except ConfigKeyError as e:
+                pdmodel_path = self.download_model(
+                    model_info, default_model_dir, pdmodel_name_v2
+                )
+            except Exception as e:
+                raise PaddleInferError(f"Download model error: {e}") from e
+
+            pdiparams_path = self.download_model(
+                model_info, default_model_dir, pdiparams_name
+            )
+
+            logger.info(f"Using {pdmodel_path}")
+            logger.info(f"Using {pdiparams_path}")
+            return pdmodel_path, pdiparams_path
+
+        model_dir = Path(model_dir)
+        pdmodel_path = model_dir / pdmodel_name
+        pdiparams_path = model_dir / pdiparams_name
+        if not pdmodel_path.exists():
+            pdmodel_path = model_dir / pdmodel_name_v2
+        self._verify_model(pdmodel_path)
+        self._verify_model(pdiparams_path)
+
+        logger.info(f"Using {pdmodel_path}")
+        logger.info(f"Using {pdiparams_path}")
+        return pdmodel_path, pdiparams_path
+
+    def download_model(
+        self, model_info, default_model_dir: str, model_file_name: str
+    ) -> Path:
+        model_file_url = f"{default_model_dir}/{model_file_name}"
+        model_file_path = (
+            self.DEFAULT_MODEL_PATH / Path(default_model_dir).name / model_file_name
+        )
+        DownloadFile.run(
+            DownloadFileInput(
+                file_url=model_file_url,
+                sha256=model_info[model_file_name],
+                save_path=model_file_path,
+                logger=logger,
+            )
+        )
+        return model_file_path
+
+    def init_predictor(self, infer_opts, ocr_version):
+        if ocr_version == OCRVersion.PPOCRV5:
+            infer_opts.enable_memory_optim()
+            return inference.create_predictor(infer_opts)
+
+        infer_opts.enable_memory_optim()
+        infer_opts.disable_glog_info()
+        infer_opts.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
+        infer_opts.delete_pass("matmul_transpose_reshape_fuse_pass")
+        infer_opts.switch_use_feed_fetch_ops(False)
+        infer_opts.switch_ir_optim(True)
+
+        return inference.create_predictor(infer_opts)
+
+    def have_key(self, key: str = "character") -> bool:
+        return False
+
+    def get_character_list(self, key: str = "character") -> List[str]:
+        return []
+
+
+class PaddleInferError(Exception):
+    pass

+ 6 - 0
python/RapidOCR/python/rapidocr/inference_engine/pytorch/__init__.py

@@ -0,0 +1,6 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from .main import TorchInferError, TorchInferSession
+
+__all__ = ["TorchInferError", "TorchInferSession"]

+ 100 - 0
python/RapidOCR/python/rapidocr/inference_engine/pytorch/device_config.py

@@ -0,0 +1,100 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from pathlib import Path
+
+import torch
+
+from ...utils.log import logger
+from ...utils.typings import DeviceType
+from ...utils.utils import mkdir
+
+cur_dir = Path(__file__).resolve().parent
+root_dir = cur_dir.parent.parent
+model_dir = root_dir / "models"
+
+
+class DeviceConfig:
+    def __init__(self, cfg):
+        self.use_cuda = cfg.engine_cfg.use_cuda
+        self.use_mps = cfg.engine_cfg.use_mps
+        self.use_npu = cfg.engine_cfg.use_npu
+
+        self.cfg = cfg
+
+    def setup_device(self):
+        if self.use_cuda:
+            device_id = self.cfg.engine_cfg.cuda_ep_cfg.device_id
+            return self.get_device(DeviceType.CUDA, device_id)
+
+        if self.use_mps:
+            return self.get_device(DeviceType.MPS)
+
+        if self.use_npu:
+            device_id = self.cfg.engine_cfg.npu_ep_cfg.device_id
+            return self.get_device(DeviceType.NPU, device_id)
+
+        return self.get_device(DeviceType.CPU)
+
+    def get_device(
+        self, device_type: DeviceType = DeviceType.CPU, device_id: int = 0
+    ) -> torch.device:
+        if device_type == DeviceType.CPU:
+            return self.config_cpu()
+
+        if device_type == DeviceType.CUDA:
+            return self.config_cuda(device_id)
+
+        if device_type == DeviceType.MPS:
+            return self.config_mps()
+
+        if device_type == DeviceType.NPU:
+            return self.config_npu(device_id)
+
+        raise DeviceConfigError(f"Unsupported device type: {device_type}")
+
+    def config_cpu(self) -> torch.device:
+        logger.info("Using CPU device")
+        return torch.device("cpu")
+
+    def config_cuda(self, device_id: int) -> torch.device:
+        if not torch.cuda.is_available():
+            raise DeviceConfigError("CUDA is not available.")
+
+        logger.info(f"Using GPU device with ID: {device_id}")
+        return torch.device(f"cuda:{device_id}")
+
+    def config_mps(self) -> torch.device:
+        if not torch.backends.mps.is_available():
+            raise DeviceConfigError("MPS is not available.")
+
+        logger.info("Using MPS device")
+        return torch.device("mps")
+
+    def config_npu(self, device_id: int) -> torch.device:
+        try:
+            import torch_npu
+        except ImportError as e:
+            raise ImportError(
+                "torch_npu is not installed. \n"
+                "Please refer to https://github.com/Ascend/pytorch to see how to install."
+            )
+
+        if not torch_npu.npu.is_available():
+            raise DeviceConfigError("NPU is not available.")
+
+        kernel_meta_dir = (model_dir / "kernel_meta").resolve()
+        mkdir(kernel_meta_dir)
+
+        options = {
+            "ACL_OP_COMPILER_CACHE_MODE": "enable",
+            "ACL_OP_COMPILER_CACHE_DIR": str(kernel_meta_dir),
+        }
+        torch_npu.npu.set_option(options)
+
+        logger.info(f"Using NPU device with ID: {device_id}")
+        return torch.device(f"npu:{device_id}")
+
+
+class DeviceConfigError(Exception):
+    pass

+ 37 - 0
python/RapidOCR/python/rapidocr/inference_engine/pytorch/main.py

@@ -0,0 +1,37 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com
+from typing import List
+
+import numpy as np
+import torch
+
+from ..base import InferSession
+from .device_config import DeviceConfig
+from .networks.main import ModelLoader
+
+
+class TorchInferSession(InferSession):
+    def __init__(self, cfg) -> None:
+        self.device = DeviceConfig(cfg).setup_device()
+        self.predictor = ModelLoader(cfg, self.device).predictor
+
+    def __call__(self, img: np.ndarray):
+        with torch.no_grad():
+            inp = torch.from_numpy(img)
+            inp = inp.to(self.device)
+            outputs = self.predictor(inp).cpu().numpy()
+            if self.device.type=="mps":
+                del inp
+                torch.mps.empty_cache()
+            return outputs
+
+    def have_key(self, key: str = "character") -> bool:
+        return False
+
+    def get_character_list(self, key: str = "character") -> List[str]:
+        return []
+
+
+class TorchInferError(Exception):
+    pass

+ 3 - 0
python/RapidOCR/python/rapidocr/inference_engine/pytorch/networks/__init__.py

@@ -0,0 +1,3 @@
+# -*- encoding: utf-8 -*-
+# @Author: SWHL
+# @Contact: liekkaskono@163.com

+ 451 - 0
python/RapidOCR/python/rapidocr/inference_engine/pytorch/networks/arch_config.yaml

@@ -0,0 +1,451 @@
+ch_PP-OCRv5_rec_mobile_infer:
+  model_type: rec
+  algorithm: SVTR_LCNet
+  Transform:
+  Backbone:
+    name: PPLCNetV3
+    scale: 0.95
+  Head:
+    name: MultiHead
+    out_channels_list:
+      CTCLabelDecode: 18385 #'blank' + ...(18384) + ' '
+    head_list:
+      - CTCHead:
+          Neck:
+            name: svtr
+            dims: 120
+            depth: 2
+            hidden_dims: 120
+            kernel_size: [1, 3]
+            use_guide: True
+          Head:
+            fc_decay: 0.00001
+      - NRTRHead:
+          nrtr_dim: 384
+          max_text_length: 25
+
+ch_PP-OCRv5_rec_server_infer:
+  model_type: rec
+  algorithm: SVTR_HGNet
+  Transform:
+  Backbone:
+    name: PPHGNetV2_B4
+    text_rec: True
+  Head:
+    name: MultiHead
+    out_channels_list:
+      CTCLabelDecode: 18385 #'blank' + ...(18384) + ' '
+    head_list:
+      - CTCHead:
+          Neck:
+            name: svtr
+            dims: 120
+            depth: 2
+            hidden_dims: 120
+            kernel_size: [1, 3]
+            use_guide: True
+          Head:
+            fc_decay: 0.00001
+      - NRTRHead:
+          nrtr_dim: 384
+          max_text_length: 25
+
+ch_PP-OCRv5_det_mobile_infer:
+  model_type: det
+  algorithm: DB
+  Transform: null
+  Backbone:
+    name: PPLCNetV3
+    scale: 0.75
+    det: True
+  Neck:
+    name: RSEFPN
+    out_channels: 96
+    shortcut: True
+  Head:
+    name: DBHead
+    k: 50
+    fix_nan: True
+
+ch_PP-OCRv5_det_server_infer:
+  model_type: det
+  algorithm: DB
+  Transform: null
+  Backbone:
+    name: PPHGNetV2_B4
+    det: True
+  Neck:
+    name: LKPAN
+    out_channels: 256
+    intracl: true
+  Head:
+    name: PFHeadLocal
+    k: 50
+    mode: "large"
+
+ch_ptocr_mobile_v2.0_cls_infer:
+  model_type: cls
+  algorithm: CLS
+  Transform:
+  Backbone:
+    name: MobileNetV3
+    scale: 0.35
+    model_name: small
+  Neck:
+  Head:
+    name: ClsHead
+    class_dim: 2
+
+Multilingual_PP-OCRv3_det_infer:
+  model_type: det
+  algorithm: DB
+  Transform:
+  Backbone:
+    name: MobileNetV3
+    scale: 0.5
+    model_name: large
+    disable_se: True
+  Neck:
+    name: RSEFPN
+    out_channels: 96
+    shortcut: True
+  Head:
+    name: DBHead
+    k: 50
+
+en_PP-OCRv3_det_infer:
+  model_type: det
+  algorithm: DB
+  Transform:
+  Backbone:
+    name: MobileNetV3
+    scale: 0.5
+    model_name: large
+    disable_se: True
+  Neck:
+    name: RSEFPN
+    out_channels: 96
+    shortcut: True
+  Head:
+    name: DBHead
+    k: 50
+
+en_PP-OCRv4_rec_infer:
+  model_type: rec
+  algorithm: SVTR_LCNet
+  Transform:
+  Backbone:
+    name: PPLCNetV3
+    scale: 0.95
+  Head:
+    name: MultiHead
+    out_channels_list:
+      CTCLabelDecode: 97 #'blank' + ...(62) + ' '
+    head_list:
+      - CTCHead:
+          Neck:
+            name: svtr
+            dims: 120
+            depth: 2
+            hidden_dims: 120
+            kernel_size: [ 1, 3 ]
+            use_guide: True
+          Head:
+            fc_decay: 0.00001
+      - NRTRHead:
+          nrtr_dim: 384
+          max_text_length: 25
+
+ch_PP-OCRv4_det_infer:
+  model_type: det
+  algorithm: DB
+  Transform: null
+  Backbone:
+    name: PPLCNetV3
+    scale: 0.75
+    det: True
+  Neck:
+    name: RSEFPN
+    out_channels: 96
+    shortcut: True
+  Head:
+    name: DBHead
+    k: 50
+
+ch_PP-OCRv4_det_server_infer:
+  model_type: det
+  algorithm: DB
+  Transform: null
+  Backbone:
+    name: PPHGNet_small
+    det: True
+  Neck:
+    name: LKPAN
+    out_channels: 256
+    intracl: true
+  Head:
+    name: PFHeadLocal
+    k: 50
+    mode: "large"
+
+ch_PP-OCRv4_rec_infer:
+  model_type: rec
+  algorithm: SVTR_LCNet
+  Transform:
+  Backbone:
+    name: PPLCNetV3
+    scale: 0.95
+  Head:
+    name: MultiHead
+    out_channels_list:
+      CTCLabelDecode: 6625 #'blank' + ...(6623) + ' '
+    head_list:
+      - CTCHead:
+          Neck:
+            name: svtr
+            dims: 120
+            depth: 2
+            hidden_dims: 120
+            kernel_size: [ 1, 3 ]
+            use_guide: True
+          Head:
+            fc_decay: 0.00001
+      - NRTRHead:
+          nrtr_dim: 384
+          max_text_length: 25
+
+ch_PP-OCRv4_rec_server_infer:
+  model_type: rec
+  algorithm: SVTR_HGNet
+  Transform:
+  Backbone:
+    name: PPHGNet_small
+  Head:
+    name: MultiHead
+    out_channels_list:
+      CTCLabelDecode: 6625 #'blank' + ...(6623) + ' '
+    head_list:
+      - CTCHead:
+          Neck:
+            name: svtr
+            dims: 120
+            depth: 2
+            hidden_dims: 120
+            kernel_size: [ 1, 3 ]
+            use_guide: True
+          Head:
+            fc_decay: 0.00001
+      - NRTRHead:
+          nrtr_dim: 384
+          max_text_length: 25
+
+chinese_cht_PP-OCRv3_rec_infer:
+  model_type: rec
+  algorithm: SVTR
+  Transform:
+  Backbone:
+    name: MobileNetV1Enhance
+    scale: 0.5
+    last_conv_stride: [1, 2]
+    last_pool_type: avg
+  Neck:
+    name: SequenceEncoder
+    encoder_type: svtr
+    dims: 64
+    depth: 2
+    hidden_dims: 120
+    use_guide: True
+  Head:
+    name: CTCHead
+    out_channels: 8423
+    fc_decay: 0.00001
+
+latin_PP-OCRv3_rec_infer:
+  model_type: rec
+  algorithm: SVTR
+  Transform:
+  Backbone:
+    name: MobileNetV1Enhance
+    scale: 0.5
+    last_conv_stride: [ 1, 2 ]
+    last_pool_type: avg
+  Neck:
+    name: SequenceEncoder
+    encoder_type: svtr
+    dims: 64
+    depth: 2
+    hidden_dims: 120
+    use_guide: True
+  Head:
+    name: CTCHead
+    out_channels: 187
+    fc_decay: 0.00001
+
+cyrillic_PP-OCRv3_rec_infer:
+  model_type: rec
+  algorithm: SVTR
+  Transform:
+  Backbone:
+    name: MobileNetV1Enhance
+    scale: 0.5
+    last_conv_stride: [ 1, 2 ]
+    last_pool_type: avg
+  Neck:
+    name: SequenceEncoder
+    encoder_type: svtr
+    dims: 64
+    depth: 2
+    hidden_dims: 120
+    use_guide: True
+  Head:
+    name: CTCHead
+    out_channels: 165
+    fc_decay: 0.00001
+
+arabic_PP-OCRv4_rec_infer:
+  model_type: rec
+  algorithm: SVTR
+  Transform:
+  Backbone:
+    name: MobileNetV1Enhance
+    scale: 0.5
+    last_conv_stride: [ 1, 2 ]
+    last_pool_type: avg
+  Neck:
+    name: SequenceEncoder
+    encoder_type: svtr
+    dims: 64
+    depth: 2
+    hidden_dims: 120
+    use_guide: True
+  Head:
+    name: CTCHead
+    out_channels: 164
+    fc_decay: 0.00001
+
+korean_PP-OCRv4_rec_infer:
+  model_type: rec
+  algorithm: SVTR
+  Transform:
+  Backbone:
+    name: MobileNetV1Enhance
+    scale: 0.5
+    last_conv_stride: [ 1, 2 ]
+    last_pool_type: avg
+  Neck:
+    name: SequenceEncoder
+    encoder_type: svtr
+    dims: 64
+    depth: 2
+    hidden_dims: 120
+    use_guide: True
+  Head:
+    name: CTCHead
+    out_channels: 3690
+    fc_decay: 0.00001
+
+japan_PP-OCRv4_rec_infer:
+  model_type: rec
+  algorithm: SVTR
+  Transform:
+  Backbone:
+    name: MobileNetV1Enhance
+    scale: 0.5
+    last_conv_stride: [ 1, 2 ]
+    last_pool_type: avg
+  Neck:
+    name: SequenceEncoder
+    encoder_type: svtr
+    dims: 64
+    depth: 2
+    hidden_dims: 120
+    use_guide: True
+  Head:
+    name: CTCHead
+    out_channels: 4401
+    fc_decay: 0.00001
+
+ta_PP-OCRv4_rec_infer:
+  model_type: rec
+  algorithm: SVTR
+  Transform:
+  Backbone:
+    name: MobileNetV1Enhance
+    scale: 0.5
+    last_conv_stride: [ 1, 2 ]
+    last_pool_type: avg
+  Neck:
+    name: SequenceEncoder
+    encoder_type: svtr
+    dims: 64
+    depth: 2
+    hidden_dims: 120
+    use_guide: True
+  Head:
+    name: CTCHead
+    out_channels: 130
+    fc_decay: 0.00001
+
+te_PP-OCRv4_rec_infer:
+  model_type: rec
+  algorithm: SVTR
+  Transform:
+  Backbone:
+    name: MobileNetV1Enhance
+    scale: 0.5
+    last_conv_stride: [ 1, 2 ]
+    last_pool_type: avg
+  Neck:
+    name: SequenceEncoder
+    encoder_type: svtr
+    dims: 64
+    depth: 2
+    hidden_dims: 120
+    use_guide: True
+  Head:
+    name: CTCHead
+    out_channels: 153
+    fc_decay: 0.00001
+
+ka_PP-OCRv4_rec_infer:
+  model_type: rec
+  algorithm: SVTR
+  Transform:
+  Backbone:
+    name: MobileNetV1Enhance
+    scale: 0.5
+    last_conv_stride: [ 1, 2 ]
+    last_pool_type: avg
+  Neck:
+    name: SequenceEncoder
+    encoder_type: svtr
+    dims: 64
+    depth: 2
+    hidden_dims: 120
+    use_guide: True
+  Head:
+    name: CTCHead
+    out_channels: 155
+    fc_decay: 0.00001
+
+devanagari_PP-OCRv4_rec_infer:
+  model_type: rec
+  algorithm: SVTR
+  Transform:
+  Backbone:
+    name: MobileNetV1Enhance
+    scale: 0.5
+    last_conv_stride: [ 1, 2 ]
+    last_pool_type: avg
+  Neck:
+    name: SequenceEncoder
+    encoder_type: svtr
+    dims: 64
+    depth: 2
+    hidden_dims: 120
+    use_guide: True
+  Head:
+    name: CTCHead
+    out_channels: 169
+    fc_decay: 0.00001
+

Některé soubory nejsou zobrazeny, neboť je v těchto rozdílových datech změněno mnoho souborů