| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163 |
- """Joblib is a set of tools to provide **lightweight pipelining in
- Python**. In particular:
- 1. transparent disk-caching of functions and lazy re-evaluation
- (memoize pattern)
- 2. easy simple parallel computing
- Joblib is optimized to be **fast** and **robust** on large
- data in particular and has specific optimizations for `numpy` arrays. It is
- **BSD-licensed**.
- ==================== ===============================================
- **Documentation:** https://joblib.readthedocs.io
- **Download:** https://pypi.python.org/pypi/joblib#downloads
- **Source code:** https://github.com/joblib/joblib
- **Report issues:** https://github.com/joblib/joblib/issues
- ==================== ===============================================
- Vision
- --------
- The vision is to provide tools to easily achieve better performance and
- reproducibility when working with long running jobs.
- * **Avoid computing the same thing twice**: code is often rerun again and
- again, for instance when prototyping computational-heavy jobs (as in
- scientific development), but hand-crafted solutions to alleviate this
- issue are error-prone and often lead to unreproducible results.
- * **Persist to disk transparently**: efficiently persisting
- arbitrary objects containing large data is hard. Using
- joblib's caching mechanism avoids hand-written persistence and
- implicitly links the file on disk to the execution context of
- the original Python object. As a result, joblib's persistence is
- good for resuming an application status or computational job, eg
- after a crash.
- Joblib addresses these problems while **leaving your code and your flow
- control as unmodified as possible** (no framework, no new paradigms).
- Main features
- ------------------
- 1) **Transparent and fast disk-caching of output value:** a memoize or
- make-like functionality for Python functions that works well for
- arbitrary Python objects, including very large numpy arrays. Separate
- persistence and flow-execution logic from domain logic or algorithmic
- code by writing the operations as a set of steps with well-defined
- inputs and outputs: Python functions. Joblib can save their
- computation to disk and rerun it only if necessary::
- >>> from joblib import Memory
- >>> location = 'your_cache_dir_goes_here'
- >>> mem = Memory(location, verbose=1)
- >>> import numpy as np
- >>> a = np.vander(np.arange(3)).astype(float)
- >>> square = mem.cache(np.square)
- >>> b = square(a) # doctest: +ELLIPSIS
- ______________________________________________________________________...
- [Memory] Calling ...square...
- square(array([[0., 0., 1.],
- [1., 1., 1.],
- [4., 2., 1.]]))
- _________________________________________________...square - ...s, 0.0min
- >>> c = square(a)
- >>> # The above call did not trigger an evaluation
- 2) **Embarrassingly parallel helper:** to make it easy to write readable
- parallel code and debug it quickly::
- >>> from joblib import Parallel, delayed
- >>> from math import sqrt
- >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
- [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
- 3) **Fast compressed Persistence**: a replacement for pickle to work
- efficiently on Python objects containing large data (
- *joblib.dump* & *joblib.load* ).
- ..
- >>> import shutil ; shutil.rmtree(location)
- """
- # PEP0440 compatible formatted version, see:
- # https://www.python.org/dev/peps/pep-0440/
- #
- # Generic release markers:
- # X.Y
- # X.Y.Z # For bugfix releases
- #
- # Admissible pre-release markers:
- # X.YaN # Alpha release
- # X.YbN # Beta release
- # X.YrcN # Release Candidate
- # X.Y # Final release
- #
- # Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
- # 'X.Y.dev0' is the canonical version of 'X.Y.dev'
- #
- __version__ = "1.5.3"
- import os
- from ._cloudpickle_wrapper import wrap_non_picklable_objects
- from ._parallel_backends import ParallelBackendBase
- from ._store_backends import StoreBackendBase
- from .compressor import register_compressor
- from .hashing import hash
- from .logger import Logger, PrintTime
- from .memory import MemorizedResult, Memory, expires_after, register_store_backend
- from .numpy_pickle import dump, load
- from .parallel import (
- Parallel,
- cpu_count,
- delayed,
- effective_n_jobs,
- parallel_backend,
- parallel_config,
- register_parallel_backend,
- )
- __all__ = [
- # On-disk result caching
- "Memory",
- "MemorizedResult",
- "expires_after",
- # Parallel code execution
- "Parallel",
- "delayed",
- "cpu_count",
- "effective_n_jobs",
- "wrap_non_picklable_objects",
- # Context to change the backend globally
- "parallel_config",
- "parallel_backend",
- # Helpers to define and register store/parallel backends
- "ParallelBackendBase",
- "StoreBackendBase",
- "register_compressor",
- "register_parallel_backend",
- "register_store_backend",
- # Helpers kept for backward compatibility
- "PrintTime",
- "Logger",
- "hash",
- "dump",
- "load",
- ]
- # Workaround issue discovered in intel-openmp 2019.5:
- # https://github.com/ContinuumIO/anaconda-issues/issues/11294
- os.environ.setdefault("KMP_INIT_AT_FORK", "FALSE")
|