Imagedl APIs

`imagedl.imagedl.ImageClient`

ImageClient is a high-level interface for searching and downloading images using different backends (e.g., BaiduImageClient, BingImageClient and GoogleImageClient) registered in ImageClientBuilder.REGISTERED_MODULES. Arguments supported when initializing this class include:

image_source (str, default: BaiduImageClient): Name of the image client backend to use. Must be one of the registered modules in ImageClientBuilder.REGISTERED_MODULES.

init_image_client_cfg (dict or None, default: None): Extra configuration passed to the underlying image client on initialization. It is merged into a default config:

default_image_client_cfg = {
"work_dir": "imagedl_outputs",
"logger_handle": ImageClient.logger_handle,
"type": image_source,
"auto_set_proxies": False,
"random_update_ua": False,
"enable_search_curl_cffi": False,
"enable_download_curl_cffi": False,
"max_retries": 5,
"maintain_session": False,
"disable_print": False,
"freeproxy_settings": None,
"default_search_cookies": None,
"default_download_cookies": None,
}

search_limits (int, default: 1000): Default maximum number of images to retrieve per search. Can be overridden per call in ImageClient.search().
num_threadings (int, default: 5): Default number of threads to use for network requests and downloads. Can be overridden per call in ImageClient.search() and ImageClient.download().
request_overrides (dict or None, default: None): Extra keyword arguments forwarded to requests.get in the underlying image client, e.g., proxies and timeout. These are stored and passed to both ImageClient.search() and ImageClient.download() unless overridden inside the backend.

`ImageClient.startcmdui`

Start an interactive command-line interface (CLI) for searching and downloading images. Intended mainly for end users running imagedl from the terminal.

Behavior:

Repeatedly:
- Prints a banner with basic information (version, work dir, usage help).
- Prompts the user for a search keyword:
  "Please enter keywords for the image search:"
Special inputs:
- q / Q: exit the program.
- r / R: restart and return to the main menu.
Any other input is treated as a search keyword:
- Calls the underlying backend’s search():
- keyword = user input
- search_limits = ImageClient.search_limits
- num_threadings = ImageClient.num_threadings
- request_overrides = ImageClient.request_overrides
- Immediately calls the backend’s download() on the search results.

Example (CLI usage):

python -m imagedl.imagedl

`ImageClient.search`

Perform an image search programmatically using the configured backend. This method only retrieves metadata; it does NOT download any images.

Arguments:

keyword (str): The search query string, e.g. "Eiffel Tower", "golden retriever".
search_limits_overrides (int | None, default: None): Per-call maximum number of images to retrieve. If None, falls back to ImageClient.search_limits.
num_threadings_overrides (int | None, default: None): Per-call override for the number of threads. If None, falls back to ImageClient.num_threadings.
filters (dict | None, default: None): Optional filter configuration passed directly to the backend (e.g., image size, color, type), if supported by the chosen image_source.

Returns:

list: A list of image metadata objects (backend-defined structure) that can be passed directly to ImageClient.download().

Example:

from imagedl.imagedl import ImageClient

client = ImageClient(
    image_source="BaiduImageClient", search_limits=200, num_threadings=10,
)

image_infos = client.search(
    keyword="cute cat", search_limits_overrides=50,
)

`ImageClient.download`

Download images from a list of image metadata entries, typically returned by ImageClient.search().

Arguments:

image_infos (list): A list of image metadata objects returned by ImageClient.search(). Each entry must contain enough information (e.g. URL) for the backend to download the corresponding image.
num_threadings_overrides (int | None, default: None): Per-call override for the number of threads used for downloading. If None, falls back to ImageClient.num_threadings.

Returns:

list: A list of image metadata objects (backend-defined structure) that can be downloaded successfully.

Example:

from imagedl.imagedl import ImageClient

client = ImageClient(work_dir="my_images")

# 1. Search
infos = client.search("Eiffel Tower", search_limits_overrides=30)

# 2. Download
client.download(infos, num_threadings_overrides=8)

`imagedl.imagedl.modules.sources.BaseImageClient`

BaseImageClient is the abstract base class for all image search & download clients in this project. Concrete clients inherit from it and reuse its common logic for:

Session management (headers, cookies, user-agent, retries)
Optional proxy auto-configuration
Multithreaded search and download
Progress bars and logging
Result saving (search_results.pkl, download_results.pkl)

Current implementations built on top of BaseImageClient include:

imagedl.imagedl.modules.sources.BaiduImageClient
imagedl.imagedl.modules.sources.BingImageClient
imagedl.imagedl.modules.sources.DuckduckgoImageClient
imagedl.imagedl.modules.sources.DanbooruImageClient
imagedl.imagedl.modules.sources.DimTownImageClient
imagedl.imagedl.modules.sources.EverypixelImageClient
imagedl.imagedl.modules.sources.FoodiesfeedImageClient
imagedl.imagedl.modules.sources.FreeNatureStockImageClient
imagedl.imagedl.modules.sources.GoogleImageClient
imagedl.imagedl.modules.sources.GelbooruImageClient
imagedl.imagedl.modules.sources.HuabanImageClient
imagedl.imagedl.modules.sources.I360ImageClient
imagedl.imagedl.modules.sources.PixabayImageClient
imagedl.imagedl.modules.sources.PexelsImageClient
imagedl.imagedl.modules.sources.SogouImageClient
imagedl.imagedl.modules.sources.SafebooruImageClient
imagedl.imagedl.modules.sources.UnsplashImageClient
imagedl.imagedl.modules.sources.WeiboImageClient
imagedl.imagedl.modules.sources.YandexImageClient
imagedl.imagedl.modules.sources.YahooImageClient

In most cases, users do not instantiate BaseImageClient directly. Instead, they use high-level wrappers such as BaiduImageClient. However, the external API surface of all clients is the same as BaseImageClient (search + download). Arguments supported when initializing this class include:

auto_set_proxies (bool, default: False): If True, randomly assign a free proxy fetched by freeproxy.ProxiedSessionClient (details refer to FreeProxy) for each request.
random_update_ua (bool, default: False): If True, randomly updates the User-Agent header before each request (using fake_useragent.UserAgent().random), providing additional variability.
enable_search_curl_cffi (bool, default: False): If True, curl_cffi.requests.Session is adopted for each search request.
enable_download_curl_cffi (bool, default: False): If True, curl_cffi.requests.Session is adopted for each download request.
max_retries (int, default: 5): Maximum number of retry attempts in BaseImageClient.get() / BaseImageClient.post() when requests fail or return non-200 HTTP status codes.
maintain_session (bool, default: False): If False: a new requests.Session is created before each request. If True: the same session is reused across requests. Combined with random_update_ua, this controls how “sticky” your session is.
logger_handle (LoggerHandle or None, default: None): Logger used for informational messages and error reporting. If None, a default LoggerHandle instance is created.
disable_print (bool, default: False): If True, suppresses console printing in LoggerHandle (logging still happens internally).
work_dir (str, default: "imagedl_outputs"): Root directory for all outputs produced by this client. Under this directory, the client will create per-source and per-search subfolders, for example:
- imagedl_outputs/BaiduImageClient/2025-11-19-18-30-00 cat/
- Inside each search folder:
- search_results.pkl
- download_results.pkl
- image files: 00000001.jpg, 00000002.png, ...
freeproxy_settings (dict or None, default: None): Arguments passed when instantiating freeproxy.ProxiedSessionClient. If None, defaults to dict(disable_print=True, proxy_sources=['ProxiflyProxiedSession'], max_tries=20, init_proxied_session_cfg={}) when auto_set_proxies=True.
default_search_cookies (dict or None, default: None): Default cookies used for each search request.
default_download_cookies (dict or None, default: None): Default cookies used for each download request.

`BaseImageClient.search`

Argument:

keyword (str): Search keyword / query sent to the image provider (e.g., "Eiffel Tower", "golden retriever").
search_limits (int, default: 1000): Target maximum number of image records to retrieve. Exact behavior depends on how BaseImageClient._constructsearchurls is implemented in the subclass.
num_threadings (int, default: 5): Number of worker threads used to fetch search pages in parallel. Each thread runs BaseImageClient._search, pulling URLs from the shared search_urls list.
filters (dict or None, default: None): Optional filter configuration that subclasses may use to refine search results (e.g., image size, color, type). The structure is client-specific.
request_overrides (dict or None, default: None): Extra keyword arguments forwarded to requests.get for search requests (e.g., timeout, headers, proxies). These are merged on top of the session’s default headers and proxy settings.

Returns:

list of image_info dicts. The exact keys are determined by the subclass, but BaseImageClient expects at least:
- identifier: a unique ID used for deduplication.
- candidate_urls: list of candidate image URLs for downloading.
- After the search pipeline, it also fills:
- work_dir: per-search directory.
- file_path: base file path (without extension) reserved for downloading.

`BaseImageClient.download`

Argument:

image_infos (list): List of image metadata entries produced by BaseImageClient.search(), or loaded from search_results.pkl. Each entry should contain at least:
- work_dir: directory where the image should be saved.
- file_path: base file path (without extension).
- candidate_urls: list of URLs to try when downloading the image.
num_threadings (int, default: 5): Number of worker threads to use for downloading images in parallel.
request_overrides (dict or None, default: None): Extra keyword arguments forwarded to requests.get for download requests (e.g., timeout, per-request headers or proxies). These options override or extend the session-level defaults.

Returns:

list of downloaded_image_info dicts. For each successfully downloaded image:
- file_path is updated to include the actual file extension (e.g. .../00000001.jpg).
- Other fields are copied from the original image_info.

API.md 11 KB Permalink Istoric Crud

Imagedl APIs

imagedl.imagedl.ImageClient

ImageClient.startcmdui

ImageClient.search

ImageClient.download

imagedl.imagedl.modules.sources.BaseImageClient

BaseImageClient.search

BaseImageClient.download

API.md 11 KB

Permalink Istoric Crud

`imagedl.imagedl.ImageClient`

`ImageClient.startcmdui`

`ImageClient.search`

`ImageClient.download`

`imagedl.imagedl.modules.sources.BaseImageClient`

`BaseImageClient.search`

`BaseImageClient.download`