# Imagedl APIs

## `imagedl.imagedl.ImageClient`

`ImageClient` is a high-level interface for searching and downloading images using different backends (*e.g.*, `BaiduImageClient`, `BingImageClient` and `GoogleImageClient`) registered in `ImageClientBuilder.REGISTERED_MODULES`.
Arguments supported when initializing this class include:

- **image_source** (`str`, default: `BaiduImageClient`): Name of the image client backend to use. Must be one of the registered modules in `ImageClientBuilder.REGISTERED_MODULES`.

- **init_image_client_cfg** (`dict` or `None`, default: `None`): Extra configuration passed to the underlying image client on initialization. It is merged into a default config:
  ```python
  default_image_client_cfg = {
    "work_dir": "imagedl_outputs",
    "logger_handle": ImageClient.logger_handle,
    "type": image_source,
    "auto_set_proxies": False,
    "random_update_ua": False,
    "enable_search_curl_cffi": False,
    "enable_download_curl_cffi": False,
    "max_retries": 5,
    "maintain_session": False,
    "disable_print": False,
    "freeproxy_settings": None,
    "default_search_cookies": None,
    "default_download_cookies": None,
  }
  ```

- **search_limits** (`int`, default: `1000`): Default maximum number of images to retrieve per search. Can be overridden per call in `ImageClient.search()`.

- **num_threadings** (`int`, default: `5`): Default number of threads to use for network requests and downloads. Can be overridden per call in `ImageClient.search()` and `ImageClient.download()`.

- **request_overrides** (`dict` or `None`, default: `None`): Extra keyword arguments forwarded to `requests.get` in the underlying image client, e.g., `proxies` and `timeout`.
  These are stored and passed to both `ImageClient.search()` and `ImageClient.download()` unless overridden inside the backend.

#### `ImageClient.startcmdui`

Start an interactive command-line interface (CLI) for searching and downloading images. Intended mainly for end users running `imagedl` from the terminal.

Behavior:

- Repeatedly:
  - Prints a banner with basic information (version, work dir, usage help).
  - Prompts the user for a search keyword:  
    "Please enter keywords for the image search:"
- Special inputs:
  - `q` / `Q`: exit the program.
  - `r` / `R`: restart and return to the main menu.
- Any other input is treated as a search keyword:
  - Calls the underlying backend’s `search()`:
    - `keyword` = user input
    - `search_limits` = `ImageClient.search_limits`
    - `num_threadings` = `ImageClient.num_threadings`
    - `request_overrides` = `ImageClient.request_overrides`
  - Immediately calls the backend’s `download()` on the search results.

Example (CLI usage):

    python -m imagedl.imagedl

#### `ImageClient.search`

Perform an image search programmatically using the configured backend. This method only retrieves metadata; it does NOT download any images.

Arguments:

- **keyword** (`str`): The search query string, e.g. `"Eiffel Tower"`, `"golden retriever"`.

- **search_limits_overrides** (`int | None`, default: `None`): Per-call maximum number of images to retrieve. If `None`, falls back to `ImageClient.search_limits`.

- **num_threadings_overrides** (`int | None`, default: `None`): Per-call override for the number of threads. If `None`, falls back to `ImageClient.num_threadings`.

- **filters** (`dict | None`, default: `None`): Optional filter configuration passed directly to the backend (*e.g.*, image size, color, type), if supported by the chosen `image_source`.

Returns:

- `list`: A list of image metadata objects (backend-defined structure) that can be passed directly to `ImageClient.download()`.

Example:

    from imagedl.imagedl import ImageClient

    client = ImageClient(
        image_source="BaiduImageClient", search_limits=200, num_threadings=10,
    )

    image_infos = client.search(
        keyword="cute cat", search_limits_overrides=50,
    )

#### `ImageClient.download`

Download images from a list of image metadata entries, typically returned by `ImageClient.search()`.

Arguments:

- **image_infos** (`list`): A list of image metadata objects returned by `ImageClient.search()`. Each entry must contain enough information (e.g. URL) for the backend to download the corresponding image.

- **num_threadings_overrides** (`int | None`, default: `None`): Per-call override for the number of threads used for downloading. If `None`, falls back to `ImageClient.num_threadings`.

Returns:

- `list`: A list of image metadata objects (backend-defined structure) that can be downloaded successfully.

Example:
    
    from imagedl.imagedl import ImageClient

    client = ImageClient(work_dir="my_images")

    # 1. Search
    infos = client.search("Eiffel Tower", search_limits_overrides=30)

    # 2. Download
    client.download(infos, num_threadings_overrides=8)


## `imagedl.imagedl.modules.sources.BaseImageClient`

`BaseImageClient` is the **abstract base class** for all image search & download clients in this project.
Concrete clients inherit from it and reuse its common logic for:

- Session management (headers, cookies, user-agent, retries)
- Optional proxy auto-configuration
- Multithreaded search and download
- Progress bars and logging
- Result saving (`search_results.pkl`, `download_results.pkl`)

Current implementations built on top of `BaseImageClient` include:

- `imagedl.imagedl.modules.sources.BaiduImageClient`
- `imagedl.imagedl.modules.sources.BingImageClient`
- `imagedl.imagedl.modules.sources.DuckduckgoImageClient`
- `imagedl.imagedl.modules.sources.DanbooruImageClient`
- `imagedl.imagedl.modules.sources.DimTownImageClient`
- `imagedl.imagedl.modules.sources.EverypixelImageClient`
- `imagedl.imagedl.modules.sources.FoodiesfeedImageClient`
- `imagedl.imagedl.modules.sources.FreeNatureStockImageClient`
- `imagedl.imagedl.modules.sources.GoogleImageClient`
- `imagedl.imagedl.modules.sources.GelbooruImageClient`
- `imagedl.imagedl.modules.sources.HuabanImageClient`
- `imagedl.imagedl.modules.sources.I360ImageClient`
- `imagedl.imagedl.modules.sources.PixabayImageClient`
- `imagedl.imagedl.modules.sources.PexelsImageClient`
- `imagedl.imagedl.modules.sources.SogouImageClient`
- `imagedl.imagedl.modules.sources.SafebooruImageClient`
- `imagedl.imagedl.modules.sources.UnsplashImageClient`
- `imagedl.imagedl.modules.sources.WeiboImageClient`
- `imagedl.imagedl.modules.sources.YandexImageClient`
- `imagedl.imagedl.modules.sources.YahooImageClient`

In most cases, users do **not** instantiate `BaseImageClient` directly. 
Instead, they use high-level wrappers such as `BaiduImageClient`. 
However, the external **API surface** of all clients is the same as `BaseImageClient` (`search` + `download`).
Arguments supported when initializing this class include:

- **auto_set_proxies** (`bool`, default: `False`): If `True`, randomly assign a free proxy fetched by `freeproxy.ProxiedSessionClient` (details refer to [FreeProxy](https://github.com/CharlesPikachu/freeproxy)) for each request.

- **random_update_ua** (`bool`, default: `False`): If `True`, randomly updates the `User-Agent` header before each request (using `fake_useragent.UserAgent().random`), providing additional variability.

- **enable_search_curl_cffi** (`bool`, default: `False`): If `True`, `curl_cffi.requests.Session` is adopted for each search request.

- **enable_download_curl_cffi** (`bool`, default: `False`): If `True`, `curl_cffi.requests.Session` is adopted for each download request.

- **max_retries** (`int`, default: `5`): Maximum number of retry attempts in `BaseImageClient.get()` / `BaseImageClient.post()` when requests fail or return non-200 HTTP status codes.

- **maintain_session** (`bool`, default: `False`): If `False`: a new `requests.Session` is created before each request. If `True`: the same session is reused across requests. Combined with `random_update_ua`, this controls how “sticky” your session is.

- **logger_handle** (`LoggerHandle` or `None`, default: `None`): Logger used for informational messages and error reporting. If `None`, a default `LoggerHandle` instance is created.

- **disable_print** (`bool`, default: `False`): If `True`, suppresses console printing in `LoggerHandle` (logging still happens internally).

- **work_dir** (`str`, default: `"imagedl_outputs"`): Root directory for all outputs produced by this client. Under this directory, the client will create per-source and per-search subfolders, for example:
  - `imagedl_outputs/BaiduImageClient/2025-11-19-18-30-00 cat/`
  - Inside each search folder:
    - `search_results.pkl`
    - `download_results.pkl`
    - image files: `00000001.jpg`, `00000002.png`, ...

- **freeproxy_settings** (`dict` or `None`, default: `None`): Arguments passed when instantiating `freeproxy.ProxiedSessionClient`. If `None`, defaults to `dict(disable_print=True, proxy_sources=['ProxiflyProxiedSession'], max_tries=20, init_proxied_session_cfg={})` when `auto_set_proxies=True`.

- **default_search_cookies** (`dict` or `None`, default: `None`): Default cookies used for each search request.

- **default_download_cookies** (`dict` or `None`, default: `None`): Default cookies used for each download request.

#### `BaseImageClient.search`

Argument:

- **keyword** (`str`): Search keyword / query sent to the image provider (e.g., `"Eiffel Tower"`, `"golden retriever"`).

- **search_limits** (`int`, default: `1000`): Target maximum number of image records to retrieve. Exact behavior depends on how `BaseImageClient._constructsearchurls` is implemented in the subclass.

- **num_threadings** (`int`, default: `5`): Number of worker threads used to fetch search pages in parallel. Each thread runs `BaseImageClient._search`, pulling URLs from the shared `search_urls` list.

- **filters** (`dict` or `None`, default: `None`): Optional filter configuration that subclasses may use to refine search results (*e.g.*, image size, color, type). The structure is client-specific.

- **request_overrides** (`dict` or `None`, default: `None`): Extra keyword arguments forwarded to `requests.get` for search requests (e.g., `timeout`, `headers`, `proxies`). These are merged on top of the session’s default headers and proxy settings.

Returns:

- `list` of `image_info` dicts. The exact keys are determined by the subclass, but `BaseImageClient` expects at least:
  - `identifier`: a unique ID used for deduplication.
  - `candidate_urls`: list of candidate image URLs for downloading.
  - After the search pipeline, it also fills:
    - `work_dir`: per-search directory.
    - `file_path`: **base** file path (without extension) reserved for downloading.

#### `BaseImageClient.download`

Argument:

- **image_infos** (`list`): List of image metadata entries produced by `BaseImageClient.search()`, or loaded from `search_results.pkl`. Each entry should contain at least:

  - `work_dir`: directory where the image should be saved.
  - `file_path`: base file path (without extension).
  - `candidate_urls`: list of URLs to try when downloading the image.

- **num_threadings** (`int`, default: `5`): Number of worker threads to use for downloading images in parallel.

- **request_overrides** (`dict` or `None`, default: `None`): Extra keyword arguments forwarded to `requests.get` for **download** requests (*e.g.*, `timeout`, per-request headers or proxies). These options override or extend the session-level defaults.

Returns:

- `list` of `downloaded_image_info` dicts. For each successfully downloaded image:

  - `file_path` is updated to include the actual file extension (e.g. `.../00000001.jpg`).
  - Other fields are copied from the original `image_info`.