Major update extending the library to solve for complex roots and optimizing GPU performance using Shared Memory.
Complex Number Support:
- Implemented `_solve_complex_cuda` and `_solve_complex_numpy` to find roots in the complex plane.
- Added specialized CUDA kernels (`_FITNESS_KERNEL_COMPLEX`, `_FITNESS_KERNEL_COMPLEX_DYNAMIC`) handling complex arithmetic (multiplication/addition) directly on the GPU.
- Updated `Function` class and `set_coeffs` to handle `np.complex128` data types.
- Updated `quadratic_solve` to return complex roots using `cmath`.
CUDA Performance & Optimization:
- Implemented Dynamic Shared Memory kernels (`extern __shared__`) to cache polynomial coefficients on the GPU block, significantly reducing global memory latency.
- Added intelligent fallback logic: The solver checks `MaxSharedMemoryPerBlock`. If the polynomial is too large for Shared Memory, it falls back to the standard Global Memory kernel to prevent crashes.
- Split complex coefficients into separate Real and Imaginary arrays for CUDA kernel efficiency.
Polynomial Logic:
- Added `_strip_leading_zeros` helper to ensure polynomial degree is correctly maintained after arithmetic operations (e.g., preventing `0x^2 + x` from being treated as degree 2).
- Updated `__init__` to allow direct coefficient injection.
GA Algorithm:
- Updated crossover logic to support 2D search space (Real + Imaginary) for complex solutions.
- Refined fitness function to explicitly handle `isinf`/`isnan` for numerical stability.
Implements two features for the Function class:
1. Adds the `__eq__` operator (`==`) to allow for logical comparison of two Function objects based on their coefficients.
2. Replaces the standard quadratic formula with a numerically stable version in `quadratic_solve` to prevent "catastrophic cancellation" errors and improve accuracy.
### 🚀 Performance (CPU)
* Replaces `np.polyval` with a parallel Numba JIT function (`_calculate_ranks_numba`).
* Replaces $O(N \log N)$ `np.argsort` with $O(N)$ `np.argpartition` in the GA loop.
* Adds `numba` as a core dependency.
### 🧠 Robustness (Algorithm)
* Implements Blend Crossover (BLX-$\alpha$) for better, extrapolative exploration.
* Uses a hybrid selection model (top X% for crossover, 100% for mutation) to preserve root niches.
* Adds `selection_percentile` and `blend_alpha` to `GA_Options` for tuning.
The previous GA logic was returning the "top N" solutions, which led to test failures when the algorithm correctly converged on only one of all possible roots (e.g., returning 1000 variations of -1.0).
This commit fixes the root-finding logic to correctly identify and return *all* unique, high-quality roots:
1. **feat(api):** Adds `root_precision` to `GA_Options`. This new parameter (default: 5) allows the user to control the number of decimal places for clustering unique roots.
2. **fix(ga):** Replaces the flawed "top N" logic in both `_solve_x_numpy` and `_solve_x_cuda`. The new process is:
* Dynamically sets a `quality_threshold` based on the user's `root_precision` (e.g., `precision=5` requires a rank > `1e6`).
* Filters the *entire* final population for all solutions that meet this quality threshold.
* Rounds these high-quality solutions to `root_precision`.
* Returns only the `np.unique()` results.
This ensures the solver returns all distinct roots that meet the accuracy requirements, rather than just the top N variations of a single root.
Reviewed-on: #19
Co-authored-by: Jonathan Rampersad <rampersad.jonathan@gmail.com>
Co-committed-by: Jonathan Rampersad <rampersad.jonathan@gmail.com>
This commit introduces a major enhancement to the genetic algorithm's convergence logic and refactors key parts of the API for better clarity and usability.
- **feat(ga):** Re-implements the GA solver (CPU & CUDA) to use a more robust strategy based on Elitism, Crossover, and Mutation. This replaces the previous, less efficient model and is designed to significantly improve accuracy and convergence speed.
- **feat(api):** Updates `GA_Options` to expose the new GA strategy parameters:
- Renames `mutation_percentage` to `mutation_strength` for clarity.
- Adds `elite_ratio`, `crossover_ratio`, and `mutation_ratio`.
- Includes a `__post_init__` validator to ensure ratios are valid.
- **refactor(api):** Moves `quadratic_solve` from a standalone function to a method of the `Function` class (`f1.quadratic_solve()`). This provides a cleaner, more object-oriented API.
- **docs:** Updates the README, `GA_Options` doc page, and `quadratic_solve` doc page to reflect all API changes, new parameters, and updated usage examples.
- **chore:** Bumps version to 0.4.0.
Reviewed-on: #16
Co-authored-by: Jonathan Rampersad <rampersad.jonathan@gmail.com>
Co-committed-by: Jonathan Rampersad <rampersad.jonathan@gmail.com>