feat: add complex root finding and dynamic CUDA shared memory optimization
Major update extending the library to solve for complex roots and optimizing GPU performance using Shared Memory. Complex Number Support: - Implemented `_solve_complex_cuda` and `_solve_complex_numpy` to find roots in the complex plane. - Added specialized CUDA kernels (`_FITNESS_KERNEL_COMPLEX`, `_FITNESS_KERNEL_COMPLEX_DYNAMIC`) handling complex arithmetic (multiplication/addition) directly on the GPU. - Updated `Function` class and `set_coeffs` to handle `np.complex128` data types. - Updated `quadratic_solve` to return complex roots using `cmath`. CUDA Performance & Optimization: - Implemented Dynamic Shared Memory kernels (`extern __shared__`) to cache polynomial coefficients on the GPU block, significantly reducing global memory latency. - Added intelligent fallback logic: The solver checks `MaxSharedMemoryPerBlock`. If the polynomial is too large for Shared Memory, it falls back to the standard Global Memory kernel to prevent crashes. - Split complex coefficients into separate Real and Imaginary arrays for CUDA kernel efficiency. Polynomial Logic: - Added `_strip_leading_zeros` helper to ensure polynomial degree is correctly maintained after arithmetic operations (e.g., preventing `0x^2 + x` from being treated as degree 2). - Updated `__init__` to allow direct coefficient injection. GA Algorithm: - Updated crossover logic to support 2D search space (Real + Imaginary) for complex solutions. - Refined fitness function to explicitly handle `isinf`/`isnan` for numerical stability.
This commit is contained in:
@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
|
||||
[project]
|
||||
# --- Core Metadata ---
|
||||
name = "polysolve"
|
||||
version = "0.6.3"
|
||||
version = "0.7.0"
|
||||
authors = [
|
||||
{ name="Jonathan Rampersad", email="jonathan@jono-rams.work" },
|
||||
]
|
||||
|
||||
Reference in New Issue
Block a user