Skip to content

buffer: add fast api for isAscii & isUtf8 #58058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Apr 28, 2025

Adds v8 fast api for IsUtf8 and IsAscii methods

isAscii benchmark: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1706/

                                                                        confidence improvement accuracy (*)   (**)  (***)
buffers/buffer-isascii.js input='hello world' length='long' n=20000000         ***     37.08 %       ±0.87% ±1.17% ±1.52%
buffers/buffer-isascii.js input='hello world' length='short' n=20000000        ***     36.83 %       ±0.86% ±1.15% ±1.51%

isUtf8 benchmark: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1708/

buffers/buffer-isutf8.js input='∀x∈ℝ: ⌈x⌉ = −⌊−x⌋' length='long' n=20000000         ***      1.95 %       ±0.22% ±0.29% ±0.38%
buffers/buffer-isutf8.js input='∀x∈ℝ: ⌈x⌉ = −⌊−x⌋' length='short' n=20000000        ***     65.95 %       ±2.35% ±3.13% ±4.08%
buffers/buffer-isutf8.js input='regular string' length='long' n=20000000            ***     42.16 %       ±4.05% ±5.39% ±7.02%
buffers/buffer-isutf8.js input='regular string' length='short' n=20000000           ***    121.41 %       ±1.15% ±1.54% ±2.02%
@anonrig anonrig requested review from lemire, jasnell and ronag April 28, 2025 01:23
@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Apr 28, 2025
@anonrig anonrig force-pushed the yagiz/add-fast-api-isascii branch from 6bb4020 to 8432a57 Compare April 28, 2025 01:23
@anonrig anonrig requested review from H4ad and mcollina April 28, 2025 01:34
Copy link

codecov bot commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 90.47619% with 2 lines in your changes missing coverage. Please review.

Project coverage is 90.21%. Comparing base (e0cf8ae) to head (ab8d328).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
src/node_buffer.cc 90.47% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #58058      +/-   ##
==========================================
+ Coverage   90.18%   90.21%   +0.03%     
==========================================
  Files         630      630              
  Lines      186393   186410      +17     
  Branches    36595    36612      +17     
==========================================
+ Hits       168103   168175      +72     
+ Misses      11090    11058      -32     
+ Partials     7200     7177      -23     
Files with missing lines Coverage Δ
src/node_buffer.cc 67.92% <90.47%> (+0.41%) ⬆️

... and 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.

@anonrig anonrig force-pushed the yagiz/add-fast-api-isascii branch from b28834c to 2841143 Compare April 29, 2025 16:58
@anonrig anonrig force-pushed the yagiz/add-fast-api-isascii branch from 2841143 to ab8d328 Compare April 29, 2025 17:00
@anonrig anonrig requested review from Renegade334, ronag and lemire April 29, 2025 17:00
Comment on lines +1181 to +1184
auto buffer_ = buffer.As<v8::TypedArray>()->Buffer();
auto buffer_data = buffer_->Data();
return simdutf::validate_utf8(reinterpret_cast<const char*>(buffer_data),
buffer_->ByteLength());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't account for any offset on the backing ArrayBuffer. ArrayBufferViewContents is probably needed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.
6 participants