Algorithm
- functional
- memory
- reduction
Functional
Most APIs in
fucntionaltakefuncas a compile time parameter.
elementwise
- Purpose: N-dimensional iteration with automatic SIMD vectorization
- Key Features:
- Abstracts loop nesting and vectorization
- Supports CPU and GPU targets
- Handles boundary conditions automatically
- Chunks inner dimension by SIMD width
- Signature:
elementwise[func, simd_width](shape: IndexList[rank]) - Example:
var shape = Index(10, 10) # 2D: 10x10 alias simd_width = 4 @parameter fn worker[width: Int, rank: Int, alignment: Int](idx: IndexList[rank]): # idx: starting coordinates [row, col] # width: elements to process in inner dimension pass elementwise[worker, simd_width](shape) - Chunking Behavior: For a 10-element dimension with SIMD width 4:
- Processes: 4 + 4 + 1 + 1 (vectorized + cleanup) per row
map
- Purpose: Simple index-based iteration with functional interface
- Signature:
map[func: fn(Int) capturing [origins] -> None](size: Int) - Use Case: Cleaner than raw
forloops when capturing context - Example:
var data = alloc[Int](10)
@parameter
fn worker(idx: Int):
data[idx] += 1
map[worker](10)parallelize
- Purpose: Multi-threaded work distribution for CPU-intensive tasks
- Signature:
parallelize[func: fn(Int) capturing [origins] -> None](num_work_items: Int) - Use Case: Heavy independent computations (e.g., image processing, scientific calc)
- Example:
var results = List[Int](capacity=1000)
# ... (fill list) ...
@parameter
fn process(idx: Int):
# Heavy computation on independent index
results[idx] = expensive_calc(results[idx])
parallelize[process](1000)- Variants:
parallelize(async),sync_parallelize(blocking)
tile
- Purpose: Loop tiling (blocking) for cache locality
- Signature:
tile[func, tile_size_list](offset, limit) - Key Note: Static tiling requires explicit cleanup tile (e.g.,
VariadicList(4, 1)) - Example:
@parameter
fn worker[width: Int](off: Int): ...
# Tile with 4, fallback to 1 for remainder
tile[worker, VariadicList(4, 1)](0, 10)unswitch
- Purpose: Hoists boolean checks out of functions (“compile-time if” at runtime)
- Signature:
unswitch[func](dynamic_bool) - Use Case: Selecting optimized/safe paths without branching inside the loop
- Example:
unswitch[worker](is_fast_mode) # Compiles 2 versions, picks 1tile_and_unswitch
- Purpose: Fused optimization (Fast path for main tiles, Safe path for cleanup)
- Use Case: Aligned SIMD for main loop, unaligned/masked for cleanup
- Example:
# 0..8 runs with True (Fast), 8..10 runs with False (Safe)
tile_and_unswitch[worker, 4](0, 10)tile_middle_unswitch_boundaries
- Purpose: Specialized 3-part tiling for convolutions (Left, Middle, Right)
- Signature:
tile_middle_unswitch_boundaries[func, tile_size, size]() - Use Case: Avoiding padding checks in the center of an image/tensor
- Behavior:
- Left:
func[..., True, False](Left Edge Check) - Middle:
func[..., False, False](No Edge Checks - FAST) - Right:
func[..., False, True](Right Edge Check)
- Left: