feat: add complex root finding and dynamic CUDA shared memory optimization

Major update extending the library to solve for complex roots and optimizing GPU performance using Shared Memory.

Complex Number Support:
- Implemented `_solve_complex_cuda` and `_solve_complex_numpy` to find roots in the complex plane.
- Added specialized CUDA kernels (`_FITNESS_KERNEL_COMPLEX`, `_FITNESS_KERNEL_COMPLEX_DYNAMIC`) handling complex arithmetic (multiplication/addition) directly on the GPU.
- Updated `Function` class and `set_coeffs` to handle `np.complex128` data types.
- Updated `quadratic_solve` to return complex roots using `cmath`.

CUDA Performance & Optimization:
- Implemented Dynamic Shared Memory kernels (`extern __shared__`) to cache polynomial coefficients on the GPU block, significantly reducing global memory latency.
- Added intelligent fallback logic: The solver checks `MaxSharedMemoryPerBlock`. If the polynomial is too large for Shared Memory, it falls back to the standard Global Memory kernel to prevent crashes.
- Split complex coefficients into separate Real and Imaginary arrays for CUDA kernel efficiency.

Polynomial Logic:
- Added `_strip_leading_zeros` helper to ensure polynomial degree is correctly maintained after arithmetic operations (e.g., preventing `0x^2 + x` from being treated as degree 2).
- Updated `__init__` to allow direct coefficient injection.

GA Algorithm:
- Updated crossover logic to support 2D search space (Real + Imaginary) for complex solutions.
- Refined fitness function to explicitly handle `isinf`/`isnan` for numerical stability.
This commit is contained in:
2025-12-05 13:47:29 -04:00
parent 602269889b
commit b415df2983
3 changed files with 733 additions and 143 deletions

View File

@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
[project]
# --- Core Metadata ---
name = "polysolve"
version = "0.6.3"
version = "0.7.0"
authors = [
{ name="Jonathan Rampersad", email="jonathan@jono-rams.work" },
]

File diff suppressed because it is too large Load Diff

View File

@@ -43,6 +43,11 @@ def base_func():
f.set_coeffs([1, 2, 3])
return f
@pytest.fixture
def complex_func():
f = Function(2, [1, 2, 2])
return f
# --- Core Functionality Tests ---
def test_solve_y(quadratic_func):
@@ -162,3 +167,36 @@ def test_get_real_roots_cuda(quadratic_func):
# Verify that the CUDA implementation also finds the correct roots within tolerance.
npt.assert_allclose(np.sort(roots), np.sort(expected_roots), atol=1e-2)
def test_get_roots_numpy(complex_func):
"""
Tests that the NumPy-based genetic algorithm approximates the roots correctly.
"""
# Using more generations for higher accuracy in testing
ga_opts = GA_Options(num_of_generations=50, data_size=200000, selection_percentile=0.66, root_precision=3)
roots = complex_func.get_roots(ga_opts, use_cuda=False)
# Check if the algorithm found values close to the two known roots.
# We don't know which order they'll be in, so we check for presence.
expected_roots = np.array([-1.0-1.j, -1.0+1.j])
npt.assert_allclose(np.sort(roots), np.sort(expected_roots), atol=1e-2)
@pytest.mark.skipif(not _CUPY_AVAILABLE, reason="CuPy is not installed, skipping CUDA test.")
def test_get_roots_cuda(complex_func):
"""
Tests that the CUDA-based genetic algorithm approximates the roots correctly.
This test implicitly verifies that the CUDA kernel is functioning.
It will be skipped automatically if CuPy is not available.
"""
ga_opts = GA_Options(num_of_generations=50, data_size=200000, selection_percentile=0.66, root_precision=3)
roots = complex_func.get_roots(ga_opts, use_cuda=True)
expected_roots = np.array([-1.0-1.j, -1+1.j])
# Verify that the CUDA implementation also finds the correct roots within tolerance.
npt.assert_allclose(np.sort(roots), np.sort(expected_roots), atol=1e-2)