We use cookies on this site to enhance your user experience
By clicking the Accept button, you agree to us doing so. More info on our cookie policy
We use cookies on this site to enhance your user experience
By clicking the Accept button, you agree to us doing so. More info on our cookie policy
SIMD[dtype, size]
dtype = Dtype.int/ Dtype.float64 ....
Int64
is a SIMD
vector of size 1.The SIMD
type has Methods
and operators
.
print(
4 * SIMD[DType.int8,4](1,2,3,4)
)
[4, 8, 12, 16]
print(
SIMD[DType.int32,4](1,1,2,2).reduce_add()
)
6
print(
SIMD[DType.bool,4](True,False,True,False).reduce_and()
)
false
The multiply operator
(*
) works in an unifying manner, with both SIMD
vectors of size 1
and size 32
.
It has the
__add__
dunder,__init__
and many moremethods
.
SIMD
on the Stack
var x = SIMD[DType.float64,2](1.5, 2.5)
var y = x.reduce_add()
print(y)
y
is a Float64
.
SIMD
on the Heap
Let’s take for example a pointer to 10
* Int64
. Instead of iterating each elements to add them together, It is also possible to do a fast addition with SIMD
!
DTypePointer
It is like a pointer, but is more specialized for SIMD
.
We’ll use alloc
, we’ll have to free
.
def main():
alias amount_of_bytes = 256
var mem = DTypePointer[DType.uint8].alloc(amount_of_bytes)
for i in range(amount_of_bytes):
mem[i] = i #slower but good first step !
SIMD
vectorLet’s load the first 8 elements
var bunch_of_bytes = SIMD[type=DType.uint8, size=8].load(mem)
print(bunch_of_bytes)
[0, 1, 2, 3, 4, 5, 6, 7]
The data is now in a SIMD
vector.
width is the size of the SIMD
vector, stride
can be used with offset
.
0◄─────┐
1 │
2◄─────┤
3 │ Stride: 2
4◄─────┤ Width: 4
5 │
6◄─────┤
7 │
│
▼
[0,2,4,6] SIMD[Width:4] ###### A. The concept ```mojo var stride_like = 2 for i in range(0,8,stride_like):
print(i) ``` > 0, 2, 4, 6 ###### B. The SIMD stride
var separated_by_2 = mem.simd_strided_load[width = 8](
stride = 2
)
print(separated_by_2)
[0, 2, 4, 6, 8, 10, 12, 14]
gather
It gathers the values stored at various positions into a SIMD
vector.
for i in range(16):
mem[i] = i*i
print(
mem.gather(
SIMD[DType.int64,4](1, 2, 5, 6)
)
)
[1, 4, 25, 36]
Here is the gather
method
of DTypePointer
in a visual form:
Memory: 0 10 20 30 40 50
│ │ │ │
└─────┬─┴──┴──┘
Gather 0 │ 3 4 5
▼
[0,30,40,50]
scatter
It assign new values to various positions. The positions(int64
) and values are provided in a SIMD
vector.
mem.scatter(
offset = SIMD[DType.int64, 2](1,10),
val = SIMD[DType.uint8, 2](0, 0)
)
print(mem[1])
print(mem[10])
0 0
Here is the scatter
method
of DTypePointer
in a visual form:
Memory: 0 10 20 30 40 50
▲ ▲
│ │
│ │
┌──┴─────┘
│ 0 2 Indexes
│ 100 200 Values
scatter
Memory: 100, 10, 200, 30, 40, 50
free
alloc
gave us some RAM
for the program, free
gives it back:
mem.free()
Very easy to use:
┌──────────────────────┐
│ RAM │
├──┐ │
└┼─┴───────────────────┘
│
│
▼
alloc
┌──┐
└──┘
our program has to give the small amount of ram back
because another program might need it !
Latest Posts