Tags / Equivalents
vmulq_f16() on Arm 64-bit - NEON
VMUL multiplies corresponding elements in two vectors. Elements in the result vector and input vectors have the same width.
_mm512_mul_ph() on Intel 64-bit - AVX512
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
vmulh_f16() on Arm 64-bit - NEON
Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.
_mm256_mul_ph() on Intel 64-bit - AVX512
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
vmul_f16() on Arm 64-bit - NEON
Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
_mm_mul_ph() on Intel 64-bit - AVX512
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".