feat: add complex root finding and dynamic CUDA shared memory optimization
Major update extending the library to solve for complex roots and optimizing GPU performance using Shared Memory. Complex Number Support: - Implemented `_solve_complex_cuda` and `_solve_complex_numpy` to find roots in the complex plane. - Added specialized CUDA kernels (`_FITNESS_KERNEL_COMPLEX`, `_FITNESS_KERNEL_COMPLEX_DYNAMIC`) handling complex arithmetic (multiplication/addition) directly on the GPU. - Updated `Function` class and `set_coeffs` to handle `np.complex128` data types. - Updated `quadratic_solve` to return complex roots using `cmath`. CUDA Performance & Optimization: - Implemented Dynamic Shared Memory kernels (`extern __shared__`) to cache polynomial coefficients on the GPU block, significantly reducing global memory latency. - Added intelligent fallback logic: The solver checks `MaxSharedMemoryPerBlock`. If the polynomial is too large for Shared Memory, it falls back to the standard Global Memory kernel to prevent crashes. - Split complex coefficients into separate Real and Imaginary arrays for CUDA kernel efficiency. Polynomial Logic: - Added `_strip_leading_zeros` helper to ensure polynomial degree is correctly maintained after arithmetic operations (e.g., preventing `0x^2 + x` from being treated as degree 2). - Updated `__init__` to allow direct coefficient injection. GA Algorithm: - Updated crossover logic to support 2D search space (Real + Imaginary) for complex solutions. - Refined fitness function to explicitly handle `isinf`/`isnan` for numerical stability.
This commit is contained in:
@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
|
||||
[project]
|
||||
# --- Core Metadata ---
|
||||
name = "polysolve"
|
||||
version = "0.6.3"
|
||||
version = "0.7.0"
|
||||
authors = [
|
||||
{ name="Jonathan Rampersad", email="jonathan@jono-rams.work" },
|
||||
]
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -43,6 +43,11 @@ def base_func():
|
||||
f.set_coeffs([1, 2, 3])
|
||||
return f
|
||||
|
||||
@pytest.fixture
|
||||
def complex_func():
|
||||
f = Function(2, [1, 2, 2])
|
||||
return f
|
||||
|
||||
# --- Core Functionality Tests ---
|
||||
|
||||
def test_solve_y(quadratic_func):
|
||||
@@ -162,3 +167,36 @@ def test_get_real_roots_cuda(quadratic_func):
|
||||
# Verify that the CUDA implementation also finds the correct roots within tolerance.
|
||||
npt.assert_allclose(np.sort(roots), np.sort(expected_roots), atol=1e-2)
|
||||
|
||||
def test_get_roots_numpy(complex_func):
|
||||
"""
|
||||
Tests that the NumPy-based genetic algorithm approximates the roots correctly.
|
||||
"""
|
||||
# Using more generations for higher accuracy in testing
|
||||
ga_opts = GA_Options(num_of_generations=50, data_size=200000, selection_percentile=0.66, root_precision=3)
|
||||
|
||||
roots = complex_func.get_roots(ga_opts, use_cuda=False)
|
||||
|
||||
# Check if the algorithm found values close to the two known roots.
|
||||
# We don't know which order they'll be in, so we check for presence.
|
||||
expected_roots = np.array([-1.0-1.j, -1.0+1.j])
|
||||
|
||||
npt.assert_allclose(np.sort(roots), np.sort(expected_roots), atol=1e-2)
|
||||
|
||||
|
||||
@pytest.mark.skipif(not _CUPY_AVAILABLE, reason="CuPy is not installed, skipping CUDA test.")
|
||||
def test_get_roots_cuda(complex_func):
|
||||
"""
|
||||
Tests that the CUDA-based genetic algorithm approximates the roots correctly.
|
||||
This test implicitly verifies that the CUDA kernel is functioning.
|
||||
It will be skipped automatically if CuPy is not available.
|
||||
"""
|
||||
|
||||
ga_opts = GA_Options(num_of_generations=50, data_size=200000, selection_percentile=0.66, root_precision=3)
|
||||
|
||||
roots = complex_func.get_roots(ga_opts, use_cuda=True)
|
||||
|
||||
expected_roots = np.array([-1.0-1.j, -1+1.j])
|
||||
|
||||
# Verify that the CUDA implementation also finds the correct roots within tolerance.
|
||||
npt.assert_allclose(np.sort(roots), np.sort(expected_roots), atol=1e-2)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user