What is SimSimd?
SimSimd is a very convenient package for many languages (JS, python, C++…) that has assembly that provides “Hardware-Accelerated Similarity Metrics and Distance Functions” (aka assembly).
The author made some crazy claims about 2500 faster functions, which I wished to test on this site.
I made an experiment page at /more/experiments/SimSimd-sqeuclidean. However, before clicking, you might notice no speedup on your side (the Vercel build). This is because the hardware at this site runs on doesn’t support these functions.
You can check the issue on GitHub.
I therefore coded up a fallback version with sroussey’s help, making it my first “real” open source contribution! It was fun!
So if you visit the site, the times will be the same, but there is a massive speedup on my M1 Pro, here are the results that I wish you could see (sqeuclidean distance between two random vectors). The “JS/TS” implementation here is effectively the same as the fallback coded up in this or this pull request.
You can see the time in seconds, and then the result of both functions (local vs simsimd).
(array of length 5)
simsimd: 0.000442374998703599 seconds
vanillaJS: 0.00019359000399708749 seconds
simsimd: 31.263229370117188
vanillaJS: 31.26322858330992
(array of length 500)
simsimd: 0.000941266001202166 seconds
vanillaJS: 0.00515948300762102 seconds
simsimd: 8283.525390625
vanillaJS: 8283.524354264142
(array of length 5000)
simsimd: 0.0026726669962517918 seconds
vanillaJS: 0.014396675003226847 seconds
simsimd: 84422.375
vanillaJS: 84422.39514275036
As you can see, on small arrays, you might be better off using vanilla JS!
Here is the code I use to generate these results (running in the astro page server-side):
const numIterations = 1000;
const small_size = 5;
const vectorA_F32Arr = new Float32Array(generateRandom1DArray(small_size));
const vectorB_F32Arr = new Float32Array(generateRandom1DArray(small_size));
const runPerformanceTests = (
algorithm:
| ((a: Float32Array | Int8Array, b: Float32Array | Int8Array) => number)
| ((a: number[], b: number[]) => number),
// Yes this is ugly. I know it is ugly. It used to be
// algorithm: (a: any, b: any) => number,
// However biomejs got upset and I got annoyed at it. It really doesn't matter.
vector1: number[] | Float32Array,
vector2: number[] | Float32Array,
iterations: number,
): PerformanceTestResult => {
let totalTime = 0;
let totalResult = 0;
for (let i = 0; i < iterations; i++) {
const { result, time } = runPerformanceTest(algorithm, vector1, vector2); // imported from another file
totalResult += result;
totalTime += time;
}
const averageTime = totalTime / iterations;
const averageResult = totalResult / iterations;
return { result: averageResult, time: averageTime };
};
const sqeuclideanDistance = (arr1: number[], arr2: number[]) => {
let result = 0;
for (let i = 0; i < arr1.length; i++) {
result += (arr1[i] - arr2[i]) * (arr1[i] - arr2[i]);
}
return result;
};
const { result: avgResSimSmall, time: avgTimeSimSmall } = runPerformanceTests(
sqeuclidean,
vectorA_F32Arr,
vectorB_F32Arr,
numIterations,
);
const { result: avgResMathSmall, time: avgTimeMathSmall } = runPerformanceTests(
sqeuclideanDistance,
vectorA_F32Arr,
vectorB_F32Arr,
numIterations,
);
const performanceTimesSmall: number[] = [avgTimeSimSmall, avgTimeMathSmall];
Written on: