DEV Community: stdlib

Connect with the stdlib community on Zulip

Athan — Wed, 17 Dec 2025 05:45:04 +0000

As the stdlib community continues to grow and evolve, so has our need for new ways to connect and collaborate (see, for example, our announcement of office hours and a public events calendar). While Gitter's simple, single-channel interface worked well in the early days, it no longer scales with the range of conversations happening around the project. Today we're excited to announce our new Zulip chat, which provides a more full-featured, structured, and searchable space for us to interact.

Why Zulip?

Zulip is open source and generously supports open-source projects like ours with a free cloud plan. Its channel-and-topic model makes it easier to keep discussions focused, follow ongoing threads, and resurface past knowledge through powerful search features.

Anyone can browse the web-public channels of stdlib's Zulip without an account, and you can sign up at any time to join the conversation.

Join and get started

The stdlib Zulip chat is open to all. A welcome bot will greet you when you first join and share some tips specific to stdlib about how to participate effectively. If you're new to Zulip, their getting started guide is an invaluable resource. If you're already familiar with applications such as Slack or Discord, much of the experience will be familiar.

We encourage you to come say hello in the #introductions channel and take some time to explore other channels and topics that may be of interest to you. If you have any questions about Zulip itself, we've got a channel for that too (#zulip).

The stdlib team is active in the chat, and public messages are the best way to get timely help—no need for routine @-mentions. Asking questions in public is the fastest way to get a response, as more people can help, plus it's likely that someone else will benefit from finding out the answer to your question. The stdlib Code of Conduct applies to all community spaces, including stdlib's Zulip. Should you encounter an issue, Zulip's reporting tools and our moderation team are available.

See you there!

We're looking forward to seeing you in the stdlib Zulip instance! We welcome questions and suggestions as we continue shaping a space that is useful, inclusive, and genuinely supportive for everyone who wants to learn, build, or contribute.

stdlib is an open source software project dedicated to providing a comprehensive suite of robust, high-performance libraries to accelerate your project's development and give you peace of mind knowing that you're depending on expertly crafted, high-quality software.

If you've enjoyed this post, give us a star 🌟 on GitHub and consider financially supporting the project. Your contributions and continued support help ensure the project's long-term success and are greatly appreciated!

Using AI in the development of stdlib

Philipp Burckhardt — Thu, 17 Jul 2025 19:18:41 +0000

Feeling fast, but working slow? A reflection on stdlib's participation in the recent METR study on AI's impact on open-source developer productivity.

I read the results of the recent METR study on "Impact of Early-2025 AI on Experienced Open-Source Developer Productivity" with great interest for two reasons. Firstly, I have been an early adopter of LLM tools. In 2020, I was lucky enough to get access to the private beta of the OpenAI API from then CTO Greg Brockman and explored the use of AI for education at Carnegie Mellon University. Secondly, because stdlib participated in the METR study, I was personally involved and contributed by working on randomized issues over several months, being allowed to use AI for some tasks and forbidden for others.

Given that stdlib's involvement is central to my perspective, it's worth providing some context on the project. stdlib is a comprehensive open-source standard library for JavaScript and Node.js, with a specific and ambitious goal: to be the fundamental library for numerical and scientific computing on the web. It is a large-scale project with well over 5 million source lines of JavaScript, C, Fortran, and WebAssembly, and composed of thousands of independently consumable packages, bringing the rigor of high-performance mathematics, statistics, and machine learning to the JavaScript ecosystem. Think of it as a foundational layer for data-intensive applications similar to the roles NumPy and SciPy serve in the Python ecosystem. In short, stdlib isn't your average JavaScript project.

A Word of Thanks

Before diving into my reflection, I want to take the opportunity to thank the METR team and especially Nate Rush for giving stdlib the chance to participate in this study with two core stdlib developers, Muhammad Haris and myself. It was a great experience to work with the METR team, and I am eager to see any future studies they will conduct. It is my conviction that, with the entire tech industry being gripped by an AI gold rush, it is incredibly valuable to have a non-profit research institute like METR conduct studies that cut through the noise with actual data.

The Slowdown

The results of the METR study are surprising, clashing with some previously published and very optimistic study results on the impact of generative AI (e.g., see GitHub and Accenture's 2023 study on the impact of Copilot on developer productivity). Citing from the Core Result section of the METR study page:

When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

Rather predictably, the results have led to a lot of discussion on Hacker News and other social channels, with parties on both sides lining up with their pitchforks.

The Perception Gap

I am part of the group of developers who estimated that they were sped up 20%-30% during the study's exit interview. While I like to believe that my productivity didn't suffer while using AI for my tasks, it's not unlikely that it might not have helped me as much as I anticipated or maybe even hampered my efforts.

But how can that be? Daily, we are reading about how AI is already revolutionizing the workplace or making software engineers redundant, with companies like Salesforce announcing that they won't be hiring for software engineering positions anymore or online lender Klarna announcing that they were shuttering their entire human customer support in favor of AI.

Many of these stories have turned out to be more hyperbole than reality. Klarna still has human support, and Salesforce still has many engineering job listings. Sadly, some of these stories appear influenced by ulterior motives, such as Klarna's strategic positioning as an "AI-native" company to capture premium valuations ahead of its IPO amid the current AI wave.

However, I have been using AI tools daily for the past three years, both at work and outside, and find them immensely useful. How do I square these benefits with the study results?

On Study Design

When confronted with results that go counter to one's expectations, it is a natural instinct to try to attack the study and identify holes to explain away the result. For example, one could point to the small sample size of 16 developers. There is also the argument that the study was conducted in a very specific context, with experienced developers working on projects they are intimately familiar with.

There might also have been a subtle selection effect in the tasks themselves: since project maintainers proposed their own task lists, it is possible that those more experienced with AI subconsciously selected issues they believed were more amenable to an agentic workflow. One could also argue that the developers were subject to the Hawthorne effect, altering their behavior simply because they knew they were being video-recorded, perhaps over-relying on the AI tools for the sake of the experiment.

Finally, and perhaps most importantly, the experimental setup of requiring screen recordings and active time tracking for a single task enforced a synchronous workflow. This effectively locked developers into what I call "supervision mode", where they had to watch the agent work rather than being free to context-switch to another problem.

Some of these critiques, particularly the enforced "supervision" workflow, could directly contribute to the observed slowdown. But others, such as selecting "AI-friendly" tasks or over-relying on the tool to impress researchers, should have biased the results toward a speedup. This makes the final outcome even more notable. The direction of various potential biases is ambiguous at best, which is why we must look at the study's core design.

As a randomized control trial, the study follows the gold standard experimental design for detecting causality. By randomizing individual tasks to "AI-allowed" or "AI-disallowed", the study isolates the effect of AI tooling. Instead of comparing one group of developers against a control group (where differences in skill could skew the results), it compares each developer against themselves. This "within-subjects" design controls for individual characteristics, from typing speed to experience with the project. With such a study design, results are harder to write off as mere statistical noise, even with a smaller sample size.

Crucially, the tasks were defined before this randomization. This avoids a common pitfall where AI might simply produce more verbose code or encourage developers to break tasks into smaller pull requests, which can inflate some productivity metrics without representing more work getting done.

16 developers from several open-source projects might not sound like much, but, in total, we completed 246 tasks. To give a sense of the work involved, the tasks Haris and I worked on were not trivial, while still being hand scoped to be completed in a few hours or less. They were a mix of core feature development (such as adding new array, string, and BLAS functions), creating custom ESLint rules to enforce project-specific coding standards, enhancing our CI/CD pipelines with new automation, and fixing bugs from our issue tracker.

And while a single developer's performance on one task is likely correlated with their performance on another and the precision of the estimates thus larger than otherwise, it is quite notable that the effect was in the opposite direction from what economists, ML experts, and the developers themselves predicted (with the former two groups being more in the range of a 40% speedup). Moreover, the effect is quite large in magnitude. A quick back-of-the-envelope calculation reveals that if the true effect were a 40% speedup, the probability of observing a result this far in the opposite direction is astronomically low.

In light of this, I have no reason to doubt the internal validity of the study and would venture that the effect measured is real within the context of the experiment. If one believed the chatter on social media and the hype merchants who two years ago were all shilling cryptocurrency (and maybe still are!) but have meanwhile all switched over to extolling the amazing speedup AI offers, then increases of 100%, 5x, or even 10x should have been in the cards. But this is definitively not what the study observed.

Embracing Agentic Development

The more important consideration for squaring my own experience with these results is external validity: how generalizable are the study's findings? The paper is a great read and touches on many possible criticisms and threats to external validity, and I won't belabor any of the points raised therein.

Instead, I will solely focus on my experience as a study participant and how I have been leveraging AI with success. I will also share my own hypotheses for why the performance of the developers in this sample was overall negatively affected by the use of AI.

To give some context, my main way of incorporating LLMs into my work before participating in this study was twofold. As something of an early adopter, I had used GitHub Copilot for auto-completion and inline suggestions and made heavy use of ChatGPT and Anthropic Claude web apps by assembling relevant context, writing detailed prompts, and copying results back into my editor. Tools such as Repomix helped streamline the process of incorporating LLMs into my daily development workflow. This general approach allowed me to review changes quickly, iterate on them by asking questions, and have the LLM make follow-up edits directly in a chat interface.

The METR study subsequently provided an excuse for me to delve into agentic programming and make Cursor an integral part of my workflow. I had used it briefly some time before but didn't find the AI-generated results compelling enough to let it loose on any codebase I was working on. But Claude Sonnet 3.7 had come out, which is still one of the most powerful models for coding tasks. Due to some very encouraging results during early testing, I was eager to put it to work on a backlog of tooling that we wanted to build for stdlib, alongside various refactoring and bug fixes.

One of my first impressions with Cursor this time around was the underlying LLM's rather impressive ability to follow the very specific coding standards and conventions of the project and, when placed in agent mode, to automatically and reliably fix lint errors and attempt to iteratively resolve errors in unit tests. This felt like another step change in capabilities, just like when OpenAI released GPT-3 Davinci in June 2020, which made a lot of use cases suddenly feasible that before would break down in any realistic scenario.

While I no longer use Cursor and have meanwhile switched to Claude Code (more on that later), I found Cursor straightforward to use, especially given that it is a fork of VSCode, which has been my IDE of choice for many years. I heavily doubt that inexperience with Cursor, which I shared with roughly a half of the developers in the study, played a major role in the results. While I didn't have an extensive .cursorrules setup (which has since been deprecated in favor of project rules), I did add basic instructions and context about the project and made sure to index the stdlib codebase. Aside from that, further customization was neither possible nor necessary, as the Cursor Agent was able to automatically pull in other files, look up function call signatures, and perform other operations for assembling context.

My experience of Cursor was largely positive during the study. As an example, I ended up working on several Bash scripts for our CI/CD pipeline, and Cursor definitely sped up my development workflow by not having to look up the man page of jq for the eleventh time given that I only use this command-line tool for manipulating JSON once in a blue moon. With the AI agent's help, I could quickly generate a function like this one to check if a GitHub issue has a specific label:

# Check if an issue has the "Tracking Issue" label.
#
# $1 - Issue number
is_tracking_issue() {
    local issue_number="$1"
    local response

    debug_log "Checking if issue #${issue_number} is a tracking issue"
    # Get the issue:
    if ! response=$(github_api "GET" "/repos/${repo_owner}/${repo_name}/issues/${issue_number}"); then
        echo "Warning: Failed to fetch issue #${issue_number}" >&2
        return 1
    fi

    # ...

    # Check if the issue has the "Tracking Issue" label:
    if echo "$response" | jq -r '.labels[].name' 2>/dev/null | grep -q "Tracking Issue"; then
        debug_log "Issue #${issue_number} is a tracking issue"
        return 0
    else
        debug_log "Issue #${issue_number} is not a tracking issue"
        return 1
    fi
}

The agent correctly assembled the jq -r '.labels[].name' filter to extract the label names from the JSON response—something that would have sent me to a documentation page for a few minutes. While a small speed bump, these moments add up. The AI handled the rote task of recalling obscure syntax, letting me focus on the actual logic.

My first takeaway is this: current LLMs are very powerful for tasks in domains that you are not intimately familiar with, allowing you to move much more quickly. Agentic tools such as Cursor and Claude Code are also very helpful to quickly navigate and learn your way around a large codebase, allowing you to ask questions and explore the codebase in a natural way. Leveraging "deep research" provides another means to more exhaustively explore a problem space in a way that the search engines of old simply cannot match.

On the other hand, some tasks were very frustrating. For example, the Cursor agent wrote one ESLint rule almost fully in one shot, but for another one, the Cursor agent was running in circles and unable to figure out the correct algorithm. Trying to prompt it to fix the bug was unsuccessful multiple times. It would have been better to not fall prey to the sunk cost fallacy and instead throw away the code and then either give the agent another shot or write it myself.

Cursor does have a neat feature of breakpoints which allow you to stop the agent at any time and revert to a prior state, something I wholeheartedly recommend using. It is a great way to avoid getting stuck in a loop of the agent trying to fix a bug that it cannot figure out.

I freely admit that I may have been a bit overeager about using AI for all of the AI-enabled tasks, partly due to my desire to learn to use Cursor productively but also due to my general amazement of what these new technologies unlock. However, maybe the METR study suggests that the question of whether a task can be more efficiently completed by AI, or whether one would be better off completing it by hand, is far from settled.

The Blank Slate Problem

Aside from occasional inefficiencies and outright mistakes in the generated code, coding agents do not have access to all the implicit knowledge and conventions of a large, mature project, which often might not be written down. In his reflections on the study, John Whiles identifies a core conflict: an expert engineer's primary value isn't just writing code; it's holding a complete, evolving mental model of the entire system in their head. The agent does not have such a mental model. Every interaction starts from a blank slate.

It is possible that some of this can be mitigated with better, more targeted instructions. As usual, there is no free lunch. One has to actively invest in making one's codebase more accessible to coding agents. And more generally, memory and learning is an unsolved problem with transformer-based LLMs, and changing that will likely require fundamental architectural advancements.

The necessity of auditing the agent's code for mistakes created two major sources of friction: the cognitive drain of 'babysitting' the AI and the time spent waiting for and reviewing its output. For every minute the agent spent running in circles on that ESLint rule, I was blocked, my attention monopolized by the need to supervise its flawed process. This synchronous, blocking workflow is exhausting and inefficient. It's the digital equivalent of shoulder-surfing an overconfident junior developer who has memorized everything there is to know about programming but cannot be trusted and who will make subtle mistakes that are hard to spot.

My advice: stay in the driver's seat during such pair programming and use the AI as a sparring partner to bounce ideas back and forth instead of yielding agency.

Delegate, Don't Supervise

Partly based on my experiences in the study, my workflow has evolved, and I have subsequently switched to using Anthropic's Claude Code. This has changed my interaction model from synchronous supervision to asynchronous delegation. I can now define a complex task via Claude Code's planning mode and then have the agent work on the task in the background. I can then turn my full attention elsewhere, be it attending a meeting, reviewing a colleague's code, or simply thinking through the next problem without interruption. Claude's work happens in parallel and is not a blocker to my own. The cognitive cost of babysitting is replaced by the much lower cost of reviewing a completed proposal later; if it didn't work out, I might just throw away the code and have the model try again, instead of engaging in a fruitless back and forth.

Claude Sonnet 4 and Opus 4 were not released at the time the METR study was conducted, and, while they mark another improvement, especially with regard to tool use by the model, the dynamics haven't fundamentally changed. The models still make mistakes and do not always implement things in an optimal or sound way, but they are now much better at following instructions and can work uninterrupted for longer periods of time.

At least for me, in contrast to those who frame coding agents as mere "stochastic parrots", I find myself absolutely amazed that, despite its warts and hiccups, we have now a technology that, given a set of instructions, is able to generate a fully-formed pull request that correctly implements logic, adheres to style guidelines, and has a passing test suite. And, in the best cases, this can happen without any human intervention.

The First 80 Percent

We still need to reconcile the observed performance decrease with how many developers, including myself, have now been leveraging AI to get tasks done in a fraction of the time, tasks that would have taken them hours or days previously. I believe that the Pareto Principle is a helpful yardstick. Named after Italian economist Vilfredo Pareto, it is commonly referred to as the 80/20 rule and posits that roughly 80% of effects come from 20% of the causes. Coding agents can now generate working code that mostly works but that might fall short if the goal is 100%.

In many instances, coding agents can easily accomplish the first 80% of a programming task, generating boilerplate, scaffolding logic, implementing core functionality, and writing a test suite. However, the final 20% of the task, from handling tricky edge cases, adhering to unwritten architectural conventions, ensuring optimal performance, and avoiding code duplication by reusing existing utilities is where the complexity lies. This last mile still requires the developer's deep, stateful mental model of the project. The rub here is that, by using the AI agent, one may bypass all the little steps which are necessary in the process of building that mental model.

But does it matter? When working on a crucial piece of a larger, complex system, it definitely does, and I would be hesitant with generative AI. But when working on a well-defined, isolated piece of code with expected behavior for inputs and outputs, why bother? The marginal cost of writing code (long recognized as only a small part of software engineering) is going to zero. In the event that there is a problem with the code, it can simply be thrown away and rewritten. The code that AI agents now generate is of decent quality, well-documented, and capable of adhering to one's coding conventions.

This brings to mind the following quote by Kent Beck.

The value of 90% of my skills just dropped to $0. The leverage for the remaining 10% went up 1000x. I need to recalibrate.

AI as a force multiplier is why I am long on AI, even though the METR study is a good reminder that we all can easily fall prey to cognitive biases.

In Thinking, Fast and Slow, Daniel Kahneman gives a classic example for biases driven by the availability heuristic: people overestimate plane crash risks due to vivid media coverage, making such events more "available" to memory than statistically riskier, yet routine, car crashes. Our judgment is swayed not by data, but by the ease of recall. In the case of working with AI agents, observing them build fully-functioning tools in seconds is a very memorable and visceral experience. On the other hand, the slow, frustrating "death by a thousand cuts" experience of auditing, debugging, and correcting the AI's subtle mistakes is the equivalent of the mundane car crash. It's a distributed cost with no single dramatic moment.

Nevertheless, I have no reason to believe that this technology will not continue to improve, and I, for one, am excited about the possibilities. For any big and ambitious project, the amount of tickets to be completed, features to be implemented, and bugs to fix vastly outstrips the available amount of time and human bandwidth to work on them.

What Future Studies Should Tell Us

It remains to be seen whether the results of the METR study can be replicated. However, the study clearly demonstrated that experts and developers were overly optimistic about the impact of AI on productivity. This is an important insight that should inform future research.

In some ways, the study raises more new questions than it answers. It looked at a very particular situation: seasoned experts working in the familiar territory of their own large, mature projects. Future studies by METR and others could vary these conditions. What happens when we throw developers into unfamiliar codebases, where, at least per my anecdotal experience, AI agents shine? Or what about junior developers or new contributors to an established open-source codebase? Under what conditions can AI act as a great equalizer, compressing the skill gap and providing a speed boost rather than slowdown?

Furthermore, the current study centered on completion time, but faster isn't always better. One possible follow-up would be a blinded study where human experts review pull requests without knowing if AI was involved. We could then measure things like the number of review cycles, the time spent in review, and the long-term maintainability of the code. This might shed light on when and how AI-assisted development may impact trading short-term speed for long-term technical debt.

Finally, the field of AI is still evolving at a rapid pace. The synchronous workflow that the study's setup encouraged could be fundamentally suboptimal. Exploring different interaction models, such as the asynchronous delegation workflow that I've moved to, could yield very different results.

How to Work With AI Now

What follows are my current recommendations for using AI in your daily workflow based on my experiences and the METR study.

Adopt an Asynchronous Workflow

The biggest drain from using AI is the cognitive load of "babysitting" it. Instead of watching the agent work, adopt an asynchronous model:

Define one or more tasks (e.g., running a set of commands to audit a codebase for lint errors and documentation mistakes) and then let AI agents work on them in the background (e.g., in separate Git worktrees of your repository), and turn your attention elsewhere.
Review the completed task(s) later. If the output is flawed, it's often better to discard it and have the model try again with a better prompt rather than engaging in a frustrating back-and-forth.

Know What to Delegate

AI can now handle the first 80% of many programming tasks, but the final 20% often requires deep context. The key is to choose the right tasks for AI:

"Vibe Code" and Prototypes: use AI for mock-ups or small, isolated tools that can be thrown away. This is where the technology's speed offers a distinct advantage.
Verifiable Code: AI is excellent for tasks that can be fully verified against an existing, robust test suite. The tests act as a safety net to catch the subtle mistakes the AI might make.
Boilerplate Code: AI can quickly generate boilerplate code, such as REST API endpoints or form validation, and can do so in a way that follows project conventions.
Learning and Navigation: use AI to quickly learn your way around a large codebase, document previously undocumented code, or to get help with tools you use infrequently. Asking LLMs questions can be much faster than hunting through documentation, particularly if that documentation is split across multiple resources.

Use and Customize Claude Code

For tools such as Claude Code, customization is a helpful means of writing down any implicit knowledge about the project that is not readily accessible from the code alone.

Provide Proper Context: drag and drop relevant files (this can include images!) into the Claude Code window for the model to use as context for the task at hand. One approach I have found useful is to add TODO comments in the codebase with the required changes, and then have Claude Code work on them. Use the planning mode to have the model think through the task and generate a plan that can be approved before immediately jumping into implementation.
Use Project Memory: use CLAUDE.md files to give the model project-specific memory, specifically on its architecture and unwritten knowledge. You can have multiple CLAUDE.md files in different project sub-directories, and the model will intelligently pick up the most relevant one based on your current context.

Automate Repetitive Actions: create custom slash commands for frequent tasks performing routine work. Below is an example stdlib:review-changed-packages command that I run to flag any possible errors in PRs that were recently merged to our development branch:

- Pull down the latest changes from the develop branch of the stdlib repository.
- Get all commits from the past $ARGUMENTS day(s) that were merged to the develop branch
- Extract a list of @stdlib packages touched by those commits
- Review the packages for any typos, bugs, violations of the stdlib style guidelines, or inconsistencies introduced by the changes.
- Fix any issues found during the review.

Build Custom Tooling: use the Claude CLI to build small, automated tools, such as a review bot that flags typos as a daily CRON job. For fuzzy tasks such as pointing out typos or inconsistencies in a PR, it's best to let Claude generate output that can be verified by a human. For well-defined tasks that can be fully automated, it is better to have Claude produce code that deterministically runs and can be verified.
Set up Hooks to Automate Actions: hooks are a powerful new feature of Claude Code that allows you to run scripts and commands at different points in Claude's agentic lifecycle.

Final Thoughts

It's natural to attack a study whose results you don't like. A better response is to ask what they might be telling you. For me, it tells me there is still a lot to learn about how to use this new, powerful, but often deeply weird and unpredictable technology. One mistake is treating it as the driver in a pair programming session that requires your constant attention. Instead, treat it like a batch process for grunt work, freeing you to focus on the problems that actually require a human brain.

GSoC 2025 Projects Announced

Philipp Burckhardt — Fri, 09 May 2025 02:30:04 +0000

Today, we are grateful to announce that stdlib, the fundamental numerical library for JavaScript, was awarded five slots in this year's Google's Summer of Code (GSoC). We participated in the program last year for the first time, and had four talented students working on a variety of projects. It was a resounding success, which we hope to surpass this year given all that we have learned over the past year and a half.

This achievement comes after a tremendously productive start to 2025. Since January 1st of this year, the stdlib community has:

Opened two thousand PRs with 1,377 successfully merged.
Welcomed contributions from 88 different contributors.
Added 3,452 commits to the repository.

For GSoC, we received 99 excellent applications from enthusiastic students. Ranking proposals was a tough decision, and we would have loved for a few more projects to be accepted. We are grateful to everyone who applied and encourage those not selected this year to stay connected, continue to contribute to the project, and to apply again next year! In fact, one of this year's accepted contributors was a repeat applicant, demonstrating how persistence and continued engagement can pay off.

The accepted projects are listed below. Each project addresses key areas that will expand JavaScript's potential for technical and scientific applications.

Add LAPACK bindings and implementations for linear algebra
Contributor: Aayush Khanna

The goal of Aayush's project is to develop JavaScript and C implementations of LAPACK (Linear Algebra Package) routines. This project aims to extend conventional LAPACK APIs by borrowing ideas from BLIS, thus ensuring easy compatibility with stdlib ndarrays and adding support for both row-major (C-style) and column-major (Fortran-style) storage layouts. This work will help overcome the LAPACK's column-major limitation and thus make advanced linear algebra operations more accessible and efficient in JavaScript environments.

Expanding array-based statistical computation in stdlib
Contributor: Gururaj Gurram

Gururaj will advance statistical operations in stdlib by introducing convenience array wrappers for all existing strided APIs, thus improving developer ergonomics for common use cases. Additionally, he will develop specialized ndarray statistical kernels with the aim of facilitating efficient statistical reductions across multi-dimensional data.

Implement base special mathematical functions in JavaScript and C
Contributor: Karan Anand

Karan will implement and enhance lower-level scalar kernels for special mathematical functions in stdlib. The goal is to complete missing C implementations for existing double-precision packages, develop new single-precision versions, and ensure consistency, accuracy, and IEEE 754 compliance. These enhancements will provide developers with the most comprehensive set of high-precision mathematical tools for scientific computing in JavaScript.

Achieve ndarray API parity with built-in JavaScript arrays
Contributor: Muhammad Haris

Haris will extend stdlib's ndarray capabilities by implementing familiar JavaScript array methods like concat, find, flat, includes, indexOf, reduce, and sort for multi-dimensional arrays. The project will develop high-performance C implementations with Node.js native add-ons for compute-intensive operations. These enhancements will allow JavaScript developers to work with multi-dimensional arrays as easily as built-in arrays, significantly expanding JavaScript's capabilities for scientific and numerical computing.

Add BLAS bindings and implementations for linear algebra
Contributor: Shabareesh Shetty

Shabareesh will expand stdlib's BLAS (Basic Linear Algebra Subprograms) support by implementing missing Level 2 (vector-matrix) and Level 3 (matrix-matrix) operations in JavaScript, C, Fortran, and WebAssembly. The project will focus on key dependencies for LAPACK routines and create performance-optimized APIs that work in both browser and server environments. These enhancements will provide essential building blocks for developing high-performance machine learning and statistical analysis applications on the web.

We're excited to see these projects develop over the coming months. Each contribution will significantly enhance stdlib's capabilities and make advanced mathematical and statistical operations more accessible to the JavaScript community. The work done by these talented contributors will help bridge the gap between traditional scientific computing environments and JavaScript, furthering our mission to create a comprehensive, high-performance standard library for JavaScript.

We'd like to extend thanks to Google for their continued support of open-source development through the Summer of Code program, and we look forward to sharing updates as the above projects progress over the course of this summer. In addition to watching for more posts on this blog, you can follow development by joining our community chat. We also hold regular office hours over video conferencing, which is a great opportunity to ask questions, share ideas, and engage directly with the stdlib team.

We hope that you'll join us in our mission to advance cutting-edge scientific computation in JavaScript. Start by showing your support and starring the project on GitHub today: https://github.com/stdlib-js/stdlib.

Google Summer of Code 2025

Athan — Thu, 27 Feb 2025 18:22:14 +0000

We're thrilled to announce that stdlib was accepted as a Google Summer of Code mentoring organization for 2025!

We are beyond excited to share that stdlib has once again been accepted as a mentoring organization for Google Summer of Code 2025! This marks our second consecutive year participating in this incredible program, and we cannot wait to work alongside aspiring open source contributors to push the boundaries of scientific computing on the web.

Google Summer of Code (GSoC) is a global initiative that introduces new contributors to open source software by offering mentorship and funding for meaningful, long-term projects. Over the years, GSoC has been instrumental in helping open source projects like stdlib grow, while also giving participants valuable real-world software development experience. With our acceptance into GSoC 2025, we are looking forward to welcoming a new wave of enthusiastic contributors who share our vision of making JavaScript and the extended ecosystem of TypeScript, Node.js, Deno, and other JavaScript runtimes first-class environments for numerical and scientific computing.

Reflecting on GSoC 2024: A Year of Growth

Last year marked our first time participating in GSoC, and we could not have asked for a better experience. We had the privilege of mentoring four incredibly talented contributors, each of whom made substantial contributions to the stdlib ecosystem.

From integrating BLAS bindings and optimizing special mathematical functions to enhancing support for boolean arrays and improving our interactive REPL experience, their work strengthened the foundation of stdlib and paved the way for even greater advancements. Beyond just code, their contributions sparked deeper engagement within our community, leading to over 2,000 pull requests from more than 100 contributors and 3,000+ new commits to stdlib since February 2024.

If you missed our retrospective on last year's program, be sure to check out our blog post: Reflecting on GSoC 2024.

What's in Store for GSoC 2025?

As we gear up for GSoC 2025, we have a range of exciting project ideas that we hope will inspire potential contributors. Whether you're passionate about numerical computing, statistical modeling, performance optimization, or developer tooling, there's something for you. Some areas we're particularly excited about include:

BLAS/LAPACK: continuing to expand stdlib's coverage of BLAS and LAPACK operations to provide a robust foundation for linear algebra and machine learning in JavaScript and Node.js.
WebAssembly: compiling BLAS and statistical kernels to WebAssembly with support for ergonomic inter-operation between WebAssembly and JavaScript.
ndarray kernels: implementing lower-level ndarray kernels for efficient element-wise iteration and reduction to improve performance.
Improving developer tooling: improving the stdlib development experience by creating better tools for automation, publishing, and managing the stdlib package ecosystem.
Expanding statistical distributions: building on previous efforts to provide C implementations for special mathematical functions, thus unlocking a wider range of probability distributions and making stdlib a comparable alternative to SciPy for statistical computing in JavaScript.

These ideas, however, are just the beginning. We believe that innovation comes from collaboration, and we welcome fresh ideas from prospective contributors. If you have a project concept that aligns with our mission and a clear plan for execution, we would love to hear about it. Our current list of ideas is available on our GSoC repository, but don't feel constrained by it—great ideas come from all directions!

How to Get Involved

If you're interested in contributing to stdlib for GSoC 2025, now is the perfect time to get started. Here's how you can begin your journey:

Explore stdlib: familiarize yourself with the project by browsing the project's GitHub repository and reading our documentation.
Join the conversation: engage with the stdlib community on Element to discuss project ideas, ask questions, and connect with mentors.
Review our guidelines: carefully read our GSoC Application Guidelines to understand what we're looking for in a proposal.
Start contributing: we strongly encourage all applicants to contribute to stdlib before submitting their application. This can be in the form of a bug fix, new feature, performance improvement, or some other enhancement to stdlib's capabilities.

The official GSoC timeline is as follows:

February 27 – March 24: prospective contributors discuss project ideas with mentoring organizations.
March 24 – April 8: application period (final deadline: April 8 at 18:00 UTC).
May 8: accepted proposals announced.
May 8 – June 1: community bonding period.
June 2 – September 1: standard 12-week coding period.

For the full timeline, visit the GSoC 2025 Timeline.

Looking Ahead

As we embark on another exciting GSoC season, we want to extend our deepest gratitude to Google for this opportunity. We are incredibly excited to meet new contributors, explore new ideas, and continue building an open source ecosystem where JavaScript thrives as a language for scientific computing.

If you're passionate about building high-quality software and eager to make an impact, we invite you to join us. We can't wait to see your ideas and begin working together to advance scientific computing in JavaScript. Let's make this year's GSoC program one to remember!

New ways to engage with the stdlib community!

Athan — Tue, 14 Jan 2025 00:47:48 +0000

Fostering a vibrant and inclusive community is crucial for ensuring the long-term success of open-source software, and stdlib is no exception. We believe that collaboration and open communication are key to driving innovation and making scientific computing on the web accessible to everyone. To that end, we're thrilled to announce two new initiatives designed to make it easier than ever for contributors, users, and maintainers to connect, collaborate, and grow together!

Weekly Office Hours

As part of our efforts to enhance transparency and collaboration, we're proud to announce weekly office hours! We've been running these informally for the past few months, and they've been a wonderful success, providing high-bandwidth opportunities to connect with project maintainers, users, and new and existing stdlib contributors.

To facilitate the coordination of office hours and other public project meetings, we've created a public GitHub repository to serve as a centralized hub where community members can propose agenda topics, review discussion points, and participate in shaping the direction of stdlib. Each week in advance of the next office hours, we'll create a new dedicated agenda issue, where you can link issues and pull requests you want to discuss, post questions in advance, and share any pre-reads. Thus far, agendas have run the gamut, from project overviews to live code reviews to discussions about the project roadmap to upcoming events and community announcements.

In short, if you have questions about stdlib or if you need help fixing a bug, figuring out what to do next, or are just looking for feedback, this is your time to shine! Please join our weekly office hours to connect with project maintainers, stay updated on the latest project news, and chat with other community members. This is a great opportunity to ask questions, share ideas, and engage directly with the stdlib team.

Everyone is welcome—drop in and say hello!

Public Community Calendar

Second, we're excited to introduce our new public community calendar, where you can stay up-to-date with all stdlib events, including office hours, project orientations, development meetings, and other important happenings.

With this calendar, you can:

Find the dates and times of upcoming office hours and meetings.
Add our events to your own calendar for easy reminders.
Stay informed about new opportunities to engage with the stdlib team and community.

How You Can Get Involved

Here are a few ways you can make the most of these new resources:

Bookmark the community calendar or add it to your own. Be on the lookout for upcoming events, and mark your calendar to join us.
Engage on GitHub. Visit our meetings repository to propose agenda topics or contribute to ongoing discussions.
Attend Office Hours. Whether you're stuck on a problem or curious about the latest project updates, office hours are an excellent opportunity to connect and learn.
Spread the Word. Help us grow the stdlib community by sharing these updates with anyone who might be interested.

Let's Build Together!

We're committed to creating a supportive and inspiring environment for everyone in the scientific computing ecosystem, and we're excited to see how these new initiatives will help our community thrive. Needless to say, we can't wait to connect with you at our next office hours!

Together, we're building the future of scientific computing on the web! 🚀

2024 Retrospective

Athan — Sat, 04 Jan 2025 21:16:29 +0000

A look back at 2024 and a preview of the year ahead.

2024 was a landmark year for stdlib, packed with progress, innovation, and community growth. Looking back, I am struck by the amount of time and effort members of the stdlib community spent refining existing APIs, crafting new functionality, and laying the groundwork for an exciting road ahead. I feel incredibly fortunate to be part of a community that is actively shaping the future of scientific computing on the web, and I am bullish on our continued success in the months to come.

In this post, I'll provide a recap of some key highlights and foreshadow what's in store for 2025. While I'll be making various shoutouts to individual contributors, none of what we accomplished this year could have happened without the entire stdlib community. The community was instrumental in doing the hard work necessary to make stdlib a success, from finding and patching bugs to reviewing pull requests and triaging issues to diving deep into the weeds of numerical algorithms and software design. If I don't mention you by name, please be sure to know that your efforts are recognized and greatly appreciated. A big thank you to everyone involved and to everyone who's helped out along the way, in ways both big and small. ❤️

TL;DR

This past year was transformative for stdlib, marked by significant growth, innovation, and community contributions. Some key highlights include:

Community Growth: 84 new contributors joined stdlib, tripling the size of our developer community and driving over 4,000 commits, 2,200 pull requests, and the release of 500+ new packages.
Google Summer of Code: four exceptional contributors helped advance critical projects, including enhanced REPL capabilities, expanded BLAS support, and new mathematical APIs.
Enhanced Developer Tools: major strides in automation included automated changelog generation, improved CI workflows, and better test coverage tracking.
Technical Milestones: significant progress was made in linear algebra (BLAS and LAPACK), fancy indexing, WebAssembly integrations, and C implementations of mathematical functions, all aimed at making JavaScript a first-class language for scientific computing.
Future Vision: looking ahead to 2025, we aim to expand our math libraries, improve REPL interactivity, explore WebGPU, and continue building tools to make scientific computing on the web more powerful and accessible.

With stdlib’s rapid growth and the collective efforts of our global community, we're shaping the future of scientific computing on the web. Join us as we take the next steps in this exciting journey!

Stats

To kick things off, some high-level year-end statistics. This year,

84 new contributors from across the world joined stdlib, tripling our developer community size and bringing new life and fresh perspectives to the project.
Together, we made over 4000 commits to the main development branch.
We opened nearly 2200 pull requests, with over 1600 of those pull requests merged.
And we shipped over 500 new packages in the project, ranging from new linear algebra routines to specialized math functions to foundational infrastructure for multi-dimensional arrays to APIs supporting WebAssembly and other accelerated environments.

These accomplishments reflect the hard work and dedication of our community. It was a busy year, and we were forced to think critically about how we can effectively scale the project and our community as both continue to grow. This meant investing in tooling and automation, improving our review and release processes, and figuring out ways to quickly identify and upskill new contributors.

Google Summer of Code

The one event which really set things in motion for stdlib in 2024 was our acceptance into Google Summer of Code (GSoC). We had previously applied in 2023, but were rejected. So when we applied in 2024, we didn't think we had much of a chance. Much to our surprise and delight, stdlib was accepted, thus setting off a mad dash to get our affairs together so that we could handle the influx of contributors to come.

GSoC ended up being a transformative experience for stdlib, bringing in talented contributors and pushing forward critical projects. As we detailed in our GSoC reflection, the road was bumpy, but we learned a lot and came out the other side. Needless to say, we were extremely lucky to have four truly excellent GSoC contributors: Aman Bhansali, Gunj Joshi, Jaysukh Makvana, and Snehil Shah. I'll have a bit more to say about their work in the sections below.

REPL

The Node.js read-eval-print loop (REPL) is often something of an afterthought in the JavaScript world, both underutilized and underappreciated. From stdlib's earliest days, we wanted to create a better REPL experience, with integrated support for stdlib's scientific computing and data processing functionality. Development of the stdlib REPL has come in fits and starts, but there's always been a goal of matching the power and feature set of Python's IPython in order to facilitate interactive exploratory data analysis in JavaScript. We were thus quite excited when Snehil Shah expressed interest in working on the stdlib REPL as part of GSoC.

Snehil already covered some of his work in a previous blog post on "Welcoming colors to the REPL!", but his and others' work covered so much more. A few highlights:

Preview completions: when typing characters matching a known symbol in the REPL, a completion preview is now displayed, helping facilitate auto-completion and saving developers precious keystrokes. Shoutout to Tudor Pagu, in particular, for adding this!
Multi-line editing: prior to adding support for multi-line editing, the REPL supported multi-line inputs, but did not support editing previously entered lines, often leading to a frustrating user experience. Now, the REPL supports multi-line editing within the terminal similar to dedicated editor applications.
Pagination of long outputs: a longstanding feature request has been to add support for something like less/more to the stdlib REPL. Previously, if a command generated a long output, a user could be confronted with a wall of text. This has now been addressed, with the hope of adding more advanced less-like search functionality in the months ahead.
Bracketed-paste: pasting multi-line input into the REPL used to execute the input line-by-line, instead of pasting it as a single prompt. While useful in some cases, this is often not the desired intent, especially when a user wishes to paste and edit multi-line input before execution.
Custom syntax-highlighting themes: developers who are used to developing in IDEs can often feel adrift when moving to a terminal lacking some of the niceties of their favorite editor. One of those niceties is syntax-highlighting. Accordingly, we worked to add support for custom theming, as detailed in Snehil's blog post.
Auto-pairing: another common IDE nicety is the automatic closing of brackets and quotation marks, helping save keystrokes and mitigate the dreaded missing bracket. Never one to shy away from a difficult task, Snehil implemented support for auto-pairing as one of his first pull requests leading up to GSoC.

Largely thanks to Snehil's work, we moved much closer to IPython parity in 2024, thus transforming the JavaScript experience for scientific computing. And we're not done yet. We still have pull requests working their way through the queue, and one thing I am particularly excited about is that we've recently started exploring adding support for the Jupyter protocol. Stay tuned for additional REPL news in 2025!

BLAS

Another area of focus has been the continued development of stdlib's BLAS (Basic Linear Algebra Subprograms) support, which provides fundamental APIs for common linear algebra operations, such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. Coming into 2024, BLAS support in stdlib was rather incomplete, particularly in terms of its support for complex-valued floating-point data types. The tide began to change with Jaysukh Makvana's efforts to achieve feature parity of stdlib's Complex64Array and Complex128Array data structures with built-in JavaScript typed arrays.

These efforts subsequently paved the way for adding Level 1 BLAS support for complex-valued typed array data types and the work of Aman Bhansali, who set out to further Level 2 and Level 3 BLAS support in stdlib. After focusing initially on lower-level BLAS strided array interfaces, Aman expanded his scope by adding WebAssembly implementations and by adding support for applying BLAS operations to stacks of matrices and vectors via higher-level multi-dimensional array (a.k.a., ndarray) APIs.

In addition to conventional BLAS routines, stdlib includes BLAS-like routines which are not a part of reference BLAS. These routines include APIs for alternative scalar and cumulative summation algorithms, sorting strided arrays, filling and manipulating strided array elements, explicit handling of NaN values, and other operations which don't fall neatly under the banner of linear algebra, but are common when working with data.

During Aman's BLAS work, we cleaned up and refactored BLAS implementations, and Muhammad Haris volunteered to extend those efforts to our extended BLAS routines. His efforts entailed migrating Node.js native add-ons to pure C in order to reduce boilerplate and leverage our extensive collection of C macros for authoring of native add-ons and further entailed adding dedicated C APIs to facilitate interfacing with stdlib's ndarrays.

These developments ensure that stdlib continues to lead the way in linear algebra support for JavaScript developers, offering powerful tools for numerical computing. While much has been completed, more work remains, and BLAS will continue to be a focal point in 2025.

LAPACK

Building on the BLAS work as part of an internship at Quansight Labs, Pranav Goswami worked to lay the foundations for LAPACK (Linear Algebra Package) support in stdlib in order to provide higher order linear algebra routines for solving systems of linear equations, eigenvalue problems, matrix factorization, and singular value decomposition. Detailed more fully in his post-internship blog post, Pranav sought to establish an approach for testing and documentation of added implementations and to leverage the ideas of BLIS to create LAPACK interfaces which facilitated interfacing with stdlib's ndarrays and thus minimize data movement and storage requirements. While a good chunk of time was spent working out the kinks and iterating on API design, Pranav made significant headway in adding various implementation utilities and nearly 30 commonly used LAPACK routines. Given the enormity of LAPACK (~1700 routines), this work will continue into the foreseeable future, so be on the lookout for more updates in the months ahead!

As a quick aside, if you're interested in learning more about how stdlib approaches interfacing with Fortran libraries, many of which still form the bedrock of modern numerical computing, be sure to check out Pranav's blog post on calling Fortran routines from JavaScript using Node.js.

C implementations of special math functions

One of stdlib's longstanding priorities is continued development of its vectorized routines for common mathematical and statistical operations. While all scalar mathematical kernels (e.g., transcendental functions, such as sin, cos, erf, gamma, etc, and statistical distribution density functions) have JavaScript implementations, many of the kernels lacked corresponding C implementations, which are needed for unlocking faster performance in Node.js and other server-side JavaScript runtimes supporting native bindings.

Gunj Joshi and others sought to fill this gap and opened over 160 pull requests adding dedicated C implementations. At this point, only a few of the most heavily used double-precision transcendental functions remain (looking at you betainc!). Efforts have now turned to completing single-precision support and adding C implementations for statistical distribution functions. We expect this work to continue for the first half of 2025 before turning our attention to higher-level strided array and ndarray APIs, with implementations for both WebAssembly and Node.js native add-ons.

Fancy indexing

Another area where we made significant progress is in improving slicing and array manipulation ergonomics. Users of numerical programming languages, such as MATLAB and Julia, and dedicated numerical computing libraries, such as NumPy, have long enjoyed the benefit of concise syntax for expressing operations affecting only a subset of array elements. For example, the following snippet demonstrates setting every other element in an array to zero with NumPy.

import numpy as np

# Create an array of ones:
x = np.ones(10)

# Set every other element to zero:
x[::2] = 0.0

As a language, JavaScript does not provide such convenient syntax, forcing users to either use more verbose object methods or manual for loops. We thus sought to address this gap by leveraging Proxy objects to support "fancy indexing". While the use of Proxy objects does incur some performance overhead due to property indirection, you now need only install and import a single package to get all the benefits of Python-style slicing in JavaScript, thus obviating the need for verbose for loops and making array manipulation significantly more ergonomic.

import array2fancy from '@stdlib/array-to-fancy';

// Create a plain array:
const x = [ 1, 2, 3, 4, 5, 6, 7, 8 ];

// Turn the plain array into a "fancy" array:
const y = array2fancy( x );

// Select the first three elements:
const v = y[ ':3' ];
// returns [ 1, 2, 3 ]

// Select every other element, starting from the second element:
v = y[ '1::2' ];
// returns [ 2, 4, 6, 8 ]

// Select every other element, in reverse order, starting with the last element:
v = y[ '::-2' ];
// returns [ 8, 6, 4, 2 ]

// Set all elements to the same value:
y[ ':' ] = 9;

// Create a shallow copy by selecting all elements:
v = y[ ':' ];
// returns [ 9, 9, 9, 9, 9, 9, 9, 9 ]

In addition to slice semantics, Jaysukh added support to stdlib for boolean arrays, thus laying the groundwork for boolean array masking.

import BooleanArray from '@stdlib/array-bool';
import array2fancy from '@stdlib/array-to-fancy';

// Create a plain array:
const x = [ 1, 2, 3, 4, 5, 6, 7, 8 ];

// Turn the plain array into a "fancy" array:
const y = array2fancy( x );

// Create a shorthand alias for creating an array "index" object:
const idx = array2fancy.idx;

// Create a boolean mask array:
const mask = new BooleanArray( [ true, false, false, true, true, true, false, false ] );

// Retrieve elements according to the mask:
const z = x[ idx( mask ) ];
// returns [ 1, 4, 5, 6 ]

We subsequently applied our learnings when adding support for boolean array masking to add support for integer array indexing.

import Int32Array from '@stdlib/array-int32';
import array2fancy from '@stdlib/array-to-fancy';

// Create a plain array:
const x = [ 1, 2, 3, 4, 5, 6, 7, 8 ];

// Turn the plain array into a "fancy" array:
const y = array2fancy( x );

// Create a shorthand alias for creating an array "index" object:
const idx = array2fancy.idx;

// Create an integer array:
const indices = new Int32Array( [ 0, 3, 4, 5 ] );

// Retrieve selected elements:
const z = x[ idx( indices ) ];
// returns [ 1, 4, 5, 6 ]

While the above demonstrates fancy indexing with built-in JavaScript array objects, we've recently extended the concept of fancy indexing to stdlib ndarrays, a topic we'll have more to say about in a future blog post.

Needless to say, we are particularly excited about these developments because we believe they will significantly improve the user experience of interactive computing and exploratory data analysis in JavaScript.

Test and build

Lastly, 2024 was a year of automation, and I would be remiss if I didn't mention the efforts of Philipp Burckhardt. Philipp was instrumental in improving our CI build and test infrastructure and improving the overall scalability of the project. His work was prolific, but there are a few key highlights I want to bring to the fore.

Automatic changelog generation: Philipp shepherded the project toward using conventional commits, which is a standardized way for adding human and machine readable meaning to commit messages, and subsequently built a robust set of tools for performing automatic releases, generating comprehensive changelogs, and coordinating the publishing of stdlib's ever-growing ecosystem of over 4000 standalone packages. What was once a manual release process can now be done by running a single GitHub workflow.
stdlib bot: Philipp created a GitHub pull request bot for automating pull request review tasks, posting helpful messages, and improving the overall maintainer development experience. In the months ahead, we're particularly keen to extend the bot's functionality to help with new contributor onboarding and flagging common contribution issues.
Test coverage automation: with a project of stdlib's size, running the entire test suite on each commit and for each pull request is simply not possible. It can thus be challenging to stitch together individual package test coverage reports in order to obtain a global view of overall test coverage. Philipp worked to address this problem by creating an automation pipeline for uploading individual test coverage reports to a dedicated repository, with support for tracking coverage metrics over time and creating expected test coverage changes for each submitted pull request. Needless to say, this has drastically improved our visibility into test coverage metrics and helped improve our confidence in tests accompanying submitted pull requests.

While we've made considerable strides in our project automation tooling, we never seem to be short of ideas for further automation and tooling improvements. Expect more to come in 2025! 🤖

Look ahead

So what's in store for 2025?! Glad you asked!

We've already alluded to various initiatives in the sections above, but, at a high level, here's where we're planning to focus our efforts in the year ahead:

GSoC 2025: assuming Google runs its annual Google Summer of Code program and we're fortunate enough to be accepted again, we'd love to continue supporting the next generation of open source contributors.
Math and stats C implementations: expanding our library of scalar math and statistics kernels and ensuring double- and single-precision parity.
BLAS: completing our WebAssembly distribution and higher-level APIs for operating on stacks of matrices and vectors.
LAPACK: continuing to chip away at the ~1700 LAPACK routines (!).
FFTs: adding initial Fast Fourier Transform (FFT) support to stdlib to help unlock algorithms for signal processing.
Vectorized operations: automating package creation for vectorized operations over scalar math and statistics kernels.
ndarray API parity: expanding the usability and familiarity of ndarrays by achieving API parity with built-in JavaScript arrays and typed arrays.
REPL: adding Jupyter-protocol support and various user-ergonomics improvements.
WebGPU: while we haven't formally committed to any specific approach, we're keen on at least exploring support for WebGPU, an emerging web standard that enables webpages to use a device's graphics processing unit (GPU) efficiently, including for general-purpose GPU computation, in order to provide APIs for accelerated scientific computing on the web.
Project funding: exploring and hopefully securing project funding to accelerate development efforts and support the continued growth of the stdlib community.

That's definitely a lot, and it's going to take a village—a community of people dedicated to our mission of making the web a first-class platform for numerical and scientific computing. If you're ready to join us in building the future of scientific computing on the web, we'd love for you to join us. Check out our contributing guide to see how you can get involved.

A personal note

As we look ahead, I'd like to share a personal reflection on what this year has meant to me. Given our growth this year, I often felt like I was drinking from a fire hose. And, honestly, it can be hard not to get burned out when you wake up day-after-day to over 100 new notifications and messages from folks wanting guidance, answers to questions, and pull requests reviewed. But, when reflecting on this past year, I am awfully proud of what we've accomplished, and I am especially heartened when I see contributors new to open source grow and flourish, sometimes using the lessons they've learned contributing as a springboard to dream jobs and opportunities. Having the fortune to see that is a driving motivation and a privilege within the greater world of open source that I do my best to not take for granted.

And with that, this concludes the 2024 retrospective. Looking back on all we've achieved together, the future of scientific computing on the web has never been brighter! Thank you again to everyone involved who's helped out along the way. The road ahead is filled with exciting opportunities, and we can't wait to see what we will achieve together in 2025. Onward and upward! 🚀

LAPACK in your web browser

Pranav Chiku — Fri, 20 Dec 2024 23:06:26 +0000

This post was originally published on the Quansight Labs blog and has been modified and republished here with Quansight's permission.

Web applications are rapidly emerging as a new frontier for high-performance scientific computation and AI-enabled end-user experiences. Underpinning the ML/AI revolution is linear algebra, a branch of mathematics concerning linear equations and their representations in vector spaces and via matrices. LAPACK ("Linear Algebra Package") is a fundamental software library for numerical linear algebra, providing robust, battle-tested implementations of common matrix operations. Despite LAPACK being a foundational component of most numerical computing programming languages and libraries, a comprehensive, high-quality LAPACK implementation tailored to the unique constraints of the web has yet to materialize. That is...until now.

Earlier this year, I had the great fortune of being a summer intern at Quansight Labs, the public benefit division of Quansight and a leader in the scientific Python ecosystem. During my internship, I worked to add initial LAPACK support to stdlib, a fundamental library for scientific computation written in C and JavaScript and optimized for use in web browsers and other web-native environments, such as Node.js and Deno. In this blog post, I'll discuss my journey, some expected and unexpected (!) challenges, and the road ahead. My hope is that this work, with a little bit of luck, provides a critical building block in making web browsers a first-class environment for numerical computation and machine learning and portends a future of more powerful AI-enabled web applications.

Sound interesting? Let's go!

What is stdlib?

Readers of this blog who are familiar with LAPACK are likely to not be intimately familiar with the wild world of web technologies. For those coming from the world of numerical and scientific computation and have familiarity with the scientific Python ecosystem, the easiest way to think of stdlib is as an open source scientific computing library in the mold of NumPy and SciPy. It provides multi-dimensional array data structures and associated routines for mathematics, statistics, and linear algebra, but uses JavaScript, rather than Python, as its primary scripting language. As such, stdlib is laser-focused on the web ecosystem and its application development paradigms. This focus necessitates some interesting design and project architecture decisions, which make stdlib rather unique when compared to more traditional libraries designed for numerical computation.

To take NumPy as an example, NumPy is a single monolithic library, where all of its components, outside of optional third-party dependencies such as OpenBLAS, form a single, indivisible unit. One cannot simply install NumPy routines for array manipulation without installing all of NumPy. If you are deploying an application which only needs NumPy's ndarray object and a couple of its manipulation routines, installing and bundling all of NumPy means including a considerable amount of "dead code". In web development parlance, we'd say that NumPy is not "tree shakeable". For a normal NumPy installation, this implies at least 30MB of disk space, and at least 15MB of disk space for a customized build which excludes all debug statements. For SciPy, those numbers can balloon to 130MB and 50MB, respectively. Needless to say, shipping a 15MB library in a web application for just a few functions is a non-starter, especially for developers needing to deploy web applications to devices with poor network connectivity or memory constraints.

Given the unique constraints of web application development, stdlib takes a bottom-up approach to its design, where every unit of functionality can be installed and consumed independently of unrelated and unused parts of the codebase. By embracing a decomposable software architecture and radical modularity, stdlib offers users the ability to install and use exactly what they need, with little-to-no excess code beyond a desired set of APIs and their explicit dependencies, thus ensuring smaller memory footprints, bundle sizes, and faster deployment.

As an example, suppose you are working with two stacks of matrices (i.e., two-dimensional slices of three-dimensional cubes), and you want to select every other slice and perform the common BLAS operation y += a * x, where x and y are ndarrays and a is a scalar constant. To do this with NumPy, you'd first install all of NumPy

pip install numpy

and then perform the various operations

# Import all of NumPy:
import numpy as np

# Define arrays:
x = np.asarray(...)
y = np.asarray(...)

# Perform operation:
y[::2,:,:] += 5.0 * x[::2,:,:]

With stdlib, in addition to having the ability to install the project as a monolithic library, you can install the various units of functionality as separate packages

npm install @stdlib/ndarray-fancy @stdlib/blas-daxpy

and then perform the various operations

// Individually import desired functionality:
import FancyArray from '@stdlib/ndarray-fancy';
import daxpy from '@stdlib/blas-daxpy';

// Define ndarray meta data:
const shape = [4, 4, 4];
const strides = [...];
const offset = 0;

// Define arrays using a "lower-level" fancy array constructor:
const x = new FancyArray('float64', [...], shape, strides, offset, 'row-major');
const y = new FancyArray('float64', [...], shape, strides, offset, 'row-major');

// Perform operation:
daxpy(5.0, x['::2,:,:'], y['::2,:,:']);

Importantly, not only can you independently install any one of stdlib's over 4,000 packages, but you can also fix, improve, and remix any one of those packages by forking an associated GitHub repository (e.g., see @stdlib/ndarray-fancy). By defining explicit layers of abstraction and dependency trees, stdlib offers you the freedom to choose the right layer of abstraction for your application. In some ways, it's a simple—and, if you're accustomed to conventional scientific software library design, perhaps unorthodox—idea, but, when tightly integrated with the web platform, it has powerful consequences and creates exciting new possibilities!

What about WebAssembly?

Okay, so maybe your interest has piqued; stdlib seems intriguing. But what does this have to do with LAPACK in web browsers? Well, one of our goals this past summer was to apply the stdlib ethos—small, narrowly scoped packages which do one thing and do one thing well—in bringing LAPACK to the web.

But wait, you say! That is an extreme undertaking. LAPACK is vast, with approximately 1,700 routines, and implementing even 10% of them within a reasonable time frame is a significant challenge. Wouldn't it be better to just compile LAPACK to WebAssembly, a portable compilation target for programming languages such as C, Go, and Rust, which enables deployment on the web, and call it a day?

Unfortunately, there are several issues with this approach.

Compiling Fortran to WebAssembly is still an area of active development (see 1, 2, 3, 4, and 5). At the time of this post, a common approach is to use f2c to compile Fortran to C and then to perform a separate compilation step to convert C to WebAssembly. However, this approach is problematic as f2c only fully supports Fortran 77, and the generated code requires extensive patching. Work is underway to develop an LLVM-based Fortran compiler, but gaps and complex toolchains remain.
As alluded to above in the discussion concerning monolithic libraries in web applications, the vastness of LAPACK is part of the problem. Even if the compilation problem is solved, including a single WebAssembly binary containing all of LAPACK in a web application needing to use only one or two LAPACK routines means considerable dead code, resulting in slower loading times and increased memory consumption.
While one could attempt to compile individual LAPACK routines to standalone WebAssembly binaries, doing so could result in binary bloat, as multiple standalone binaries may contain duplicated code from common dependencies. To mitigate binary bloat, one could attempt to perform module splitting. In this scenario, one first factors out common dependencies into a standalone binary containing shared code and then generates separate binaries for individual APIs. While suitable in some cases, this can quickly get unwieldy, as this approach requires linking individual WebAssembly modules at load-time by stitching together the exports of one or more modules with the imports of one or more other modules. Not only can this be tedious, but this approach also entails a performance penalty due to the fact that, when WebAssembly routines call imported exports, they now must cross over into JavaScript, rather than remaining within WebAssembly. Sound complex? It is!
Apart from WebAssembly modules operating exclusively on scalar input arguments (e.g., computing the sine of a single number), every WebAssembly module instance must be associated with WebAssembly memory, which is allocated in fixed increments of 64KiB (i.e., a "page"). And importantly, as of this blog post, WebAssembly memory can only grow and never shrink. As there is currently no mechanism for releasing memory to a host, a WebAssembly application's memory footprint can only increase. These two aspects combined increase the likelihood of allocating memory which is never used and the prevalence of memory leaks.
Lastly, while powerful, WebAssembly entails a steeper learning curve and a more complex set of often rapidly evolving toolchains. In end-user applications, interfacing between JavaScript—a web-native dynamically-compiled programming language—and WebAssembly further brings increased complexity, especially when having to perform manual memory management.

To help illustrate the last point, let's return to the BLAS routine daxpy, which performs the operation y = a*x + y and where x and y are strided vectors and a a scalar constant. If implemented in C, a basic implementation might look like the following code snippet.

void c_daxpy(const int N, const double alpha, const double *X, const int strideX, double *Y, const int strideY) {
    int ix;
    int iy;
    int i;
    if (N <= 0) {
        return;
    }
    if (alpha == 0.0) {
        return;
    }
    if (strideX < 0) {
        ix = (1-N) * strideX;
    } else {
        ix = 0;
    }
    if (strideY < 0) {
        iy = (1-N) * strideY;
    } else {
        iy = 0;
    }
    for (i = 0; i < N; i++) {
        Y[iy] += alpha * X[ix];
        ix += strideX;
        iy += strideY;
    }
    return;
}

After compilation to WebAssembly and loading the WebAssembly binary into our web application, we need to perform a series of steps before we can call the c_daxpy routine from JavaScript. First, we need to instantiate a new WebAssembly module.

const binary = new UintArray([...]);

const mod = new WebAssembly.Module(binary);

Next, we need to define module memory and create a new WebAssembly module instance.

// Initialize 10 pages of memory and allow growth to 100 pages:
const mem = new WebAssembly.Memory({
    'initial': 10,  // 640KiB, where each page is 64KiB
    'maximum': 100  // 6.4MiB
});

// Create a new module instance:
const instance = new WebAssembly.Instance(mod, {
    'env': {
        'memory': mem
    }
});

After creating a module instance, we can now invoke the exported BLAS routine. However, if data is defined outside of module memory, we first need to copy that data to the memory instance and always do so in little-endian byte order.

// External data:
const xdata = new Float64Array([...]);
const ydata = new Float64Array([...]);

// Specify a vector length:
const N = 5;

// Specify vector strides (in units of elements):
const strideX = 2;
const strideY = 4;

// Define pointers (i.e., byte offsets) for storing two vectors:
const xptr = 0;
const yptr = N * 8; // 8 bytes per double

// Create a DataView over module memory:
const view = new DataView(mem.buffer);

// Resolve the first indexed elements in both `xdata` and `ydata`:
let offsetX = 0;
if (strideX < 0) {
    offsetX = (1-N) * strideX;
}
let offsetY = 0;
if (strideY < 0) {
    offsetY = (1-N) * strideY;
}

// Write data to the memory instance:
for (let i = 0; i < N; i++) {
    view.setFloat64(xptr+(i*8), xdata[offsetX+(i*strideX)], true);
    view.setFloat64(yptr+(i*8), ydata[offsetY+(i*strideY)], true);
}

Now that data is written to module memory, we can call the c_daxpy routine.

instance.exports.c_daxpy(N, 5.0, xptr, 1, yptr, 1);

And, finally, if we need to pass the results to a downstream library without support for WebAssembly memory "pointers" (i.e., byte offsets), such as D3, for visualization or further analysis, we need to copy data from module memory back to the original output array.

for (let i = 0; i < N; i++) {
    ydata[offsetY+(i*strideY)] = view.getFloat64(yptr+(i*8), true);
}

That's a lot of work just to compute y = a*x + y. In contrast, compare to a plain JavaScript implementation, which might look like the following code snippet.

function daxpy(N, alpha, X, strideX, Y, strideY) {
    let ix;
    let iy;
    let i;
    if (N <= 0) {
        return;
    }
    if (alpha === 0.0) {
        return;
    }
    if (strideX < 0) {
        ix = (1-N) * strideX;
    } else {
        ix = 0;
    }
    if (strideY < 0) {
        iy = (1-N) * strideY;
    } else {
        iy = 0;
    }
    for (i = 0; i < N; i++) {
        Y[iy] += alpha * X[ix];
        ix += strideX;
        iy += strideY;
    }
    return;
}

With the JavaScript implementation, we can then directly call daxpy with our externally defined data without the data movement required in the WebAssembly example above.

daxpy(N, 5.0, xdata, 1, ydata, 1);

At least in this case, not only is the WebAssembly approach less ergonomic, but, as might be expected given the required data movement, there's a negative performance impact, as well, as demonstrated in the following figure.

Figure 1: Performance comparison of stdlib's C, JavaScript, and WebAssembly (Wasm) implementations for the BLAS routine daxpy for increasing array lengths (x-axis). In the Wasm (copy) benchmark, input and output data is copied to and from Wasm memory, leading to poorer performance.

In the figure above, I'm displaying a performance comparison of stdlib's C, JavaScript, and WebAssembly (Wasm) implementations for the BLAS routine daxpy for increasing array lengths, as enumerated along the x-axis. The y-axis shows a normalized rate relative to a baseline C implementation. In the Wasm benchmark, input and output data is allocated and manipulated directly in WebAssembly module memory, and, in the Wasm (copy) benchmark, input and output data is copied to and from WebAssembly module memory, as discussed above. From the chart, we may observe the following:

In general, thanks to highly optimized just-in-time (JIT) compilers, JavaScript code, when carefully written, can execute only 2-to-3 times slower than native code. This result is impressive for a loosely typed, dynamically compiled programming language and, at least for daxpy, remains consistent across varying array lengths.
As data sizes and thus the amount of time spent in a WebAssembly module increase, WebAssembly can approach near-native (~1.5x) speed. This result aligns more generally with expected WebAssembly performance.
While WebAssembly can achieve near-native speed, data movement requirements may adversely affect performance, as observed for daxpy. In such cases, a well-crafted JavaScript implementation which avoids such requirements can achieve equal, if not better, performance, as is the case for daxpy.

Overall, WebAssembly can offer performance improvements; however, the technology is not a silver bullet and needs to be used carefully in order to realize desired gains. And even when offering superior performance, such gains must be balanced against the costs of increased complexity, potentially larger bundle sizes, and more complex toolchains. For many applications, a plain JavaScript implementation will do just fine.

Radical modularity

Now that I've prosecuted the case against just compiling the entirety of LAPACK to WebAssembly and calling it a day, where does that leave us? Well, if we're going to embrace the stdlib ethos, it leaves us in need of radical modularity.

To embrace radical modularity is to recognize that what is best is highly contextual, and, depending on the needs and constraints of user applications, developers need the flexibility to pick the right abstraction. If a developer is writing a Node.js application, that may mean binding to hardware-optimized libraries, such as OpenBLAS, Intel MKL, or Apple Accelerate in order to achieve superior performance. If a developer is deploying a web application needing a small set of numerical routines, JavaScript is likely the right tool for the job. And if a developer is working on a large, resource intensive WebAssembly application (e.g., for image editing or a gaming engine), then being able to easily compile individual routines as part of the larger application will be paramount. In short, we want a radically modular LAPACK.

My mission was to lay the groundwork for such an endeavor, to work out the kinks and find the gaps, and to hopefully get us a few steps closer to high-performance linear algebra on the web. But what does radical modularity look like? It all begins with the fundamental unit of functionality, the package.

Every package in stdlib is its own standalone thing, containing co-localized tests, benchmarks, examples, documentation, build files, and associated meta data (including the enumeration of any dependencies) and defining a clear API surface with the outside world. In order to add LAPACK support to stdlib, that means creating a separate standalone package for each LAPACK routine with the following structure:

├── benchmark
│   ├── c
│   │   ├── Makefile
│   │   └── benchmark.c
│   ├── fortran
│   │   ├── Makefile
│   │   └── benchmark.f
│   └── benchmark*.js
├── docs
│   ├── types
│   │   ├── index.d.ts
│   │   └── test.ts
│   └── repl.txt
├── examples
│   ├── c
│   │   ├── Makefile
│   │   └── example.c
│   └── index.js
├── include/*
├── lib
│   ├── index.js
│   └── *.js
├── src
│   ├── Makefile
│   ├── addon.c
│   ├── *.c
│   └── *.f
├── test
│   └── test*.js
├── binding.gyp
├── include.gypi
├── manifest.json
├── package.json
└── README.md

Briefly,

benchmark: a folder containing micro-benchmarks to assess performance relative to a reference implementation (i.e., reference LAPACK).
docs: a folder containing auxiliary documentation including REPL help text and TypeScript declarations defining typed API signatures.
examples: a folder containing executable demonstration code, which, in addition to serving as documentation, helps developers sanity check implementation behavior.
include: a folder containing C header files.
lib: a folder containing JavaScript source implementations, with index.js serving as the package entry point and other *.js files defining internal implementation modules.
src: a folder containing C and Fortran source implementations. Each modular LAPACK package should contain a slightly modified Fortran reference implementation (F77 to free-form Fortran). C files include a plain C implementation which follows the Fortran reference implementation, a wrapper for calling the Fortran reference implementation, a wrapper for calling hardware-optimized libraries (e.g., OpenBLAS) in server-side applications, and a native binding for calling into compiled C from JavaScript in Node.js or a compatible server-side JavaScript runtime.
test: a folder containing unit tests for testing expected behavior in both JavaScript and native implementations. Tests for native implementations are written in JavaScript and leverage the native binding for interoperation between JavaScript and C/Fortran.
binding.gyp/include.gypi: build files for compiling Node.js native add-ons, which provide a bridge between JavaScript and native code.
manifest.json: a configuration file for stdlib's internal C and Fortran compiled source file package management.
package.json: a file containing package meta data, including the enumeration of external package dependencies and a path to a plain JavaScript implementation for use in browser-based web applications.
README.md: a file containing a package's primary documentation, which includes API signatures and examples for both JavaScript and C interfaces.

Given stdlib's demanding documentation and testing requirements, adding support for each routine is a decent amount of work, but the end result is robust, high-quality, and, most importantly, modular code suitable for serving as the foundation for scientific computation on the modern web. But enough with the preliminaries! Let's get down to business!

A multi-phase approach

Building on previous efforts which added BLAS support to stdlib, we decided to follow a similar multi-phase approach when adding LAPACK support in which we first prioritize JavaScript implementations and their associated testing and documentation and then, once tests and documentation are present, back fill C and Fortran implementations and any associated native bindings to hardware-optimized libraries. This approach allows us to put some early points on the board, so to speak, quickly getting APIs in front of users, establishing robust test procedures and benchmarks, and investigating potential avenues for tooling and automation before diving into the weeds of build toolchains and performance optimizations. But where to even begin?

To determine which LAPACK routines to target first, I parsed LAPACK's Fortran source code to generate a call graph. This allowed me to infer the dependency tree for each LAPACK routine. With the graph in hand, I then performed a topological sort, thus helping me identify routines without dependencies and which will inevitably be building blocks for other routines. While a depth-first approach in which I picked a particular high-level routine and worked backward would enable me to land a specific feature, such an approach might cause me to get bogged down trying to implement routines of increasing complexity. By focusing on the "leaves" of the graph, I could prioritize commonly used routines (i.e., routines with high indegrees) and thus maximize my impact by unlocking the ability to deliver multiple higher-level routines either later in my efforts or by other contributors.

With my plan in hand, I was excited to get to work. For my first routine, I chose dlaswp, which performs a series of row interchanges on a general rectangular matrix according to a provided list of pivot indices and which is a key building block for LAPACK's LU decomposition routines. And that is when my challenges began...

Challenges

Legacy Fortran

Prior to my Quansight Labs internship, I was (and still am!) a regular contributor to LFortran, a modern interactive Fortran compiler built on top of LLVM, and I was feeling fairly confident in my Fortran skills. However, one of my first challenges was simply understanding what is now considered "legacy" Fortran code. I highlight three initial hurdles below.

Formatting

LAPACK was originally written in FORTRAN 77 (F77). While the library was moved to Fortran 90 in version 3.2 (2008), legacy conventions still persist in the reference implementation. One of the most visible of those conventions is formatting.

Developers writing F77 programs did so using a fixed form layout inherited from punched cards. This layout had strict requirements concerning the use of character columns:

Comments occupying an entire line must begin with a special character (e.g., *, !, or C) in the first column.
For non-comment lines, 1) the first five columns must be blank or contain a numeric label, 2) column six is reserved for continuation characters, 3) executable statements must begin at column seven, and 4) any code beyond column 72 was ignored.

Fortran 90 introduced the free form layout which removed column and line length restrictions and settled on ! as the comment character. The following code snippet shows the reference implementation for the LAPACK routine dlacpy:

      SUBROUTINE dlacpy( UPLO, M, N, A, LDA, B, LDB )
*
*  -- LAPACK auxiliary routine --
*  -- LAPACK is a software package provided by Univ. of Tennessee,    --
*  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
*
*     .. Scalar Arguments ..
      CHARACTER          UPLO
      INTEGER            LDA, LDB, M, N
*     ..
*     .. Array Arguments ..
      DOUBLE PRECISION   A( LDA, * ), B( LDB, * )
*     ..
*
*  =====================================================================
*
*     .. Local Scalars ..
      INTEGER            I, J
*     ..
*     .. External Functions ..
      LOGICAL            LSAME
      EXTERNAL           lsame
*     ..
*     .. Intrinsic Functions ..
      INTRINSIC          min
*     ..
*     .. Executable Statements ..
*
      IF( lsame( uplo, 'U' ) ) THEN
         DO 20 j = 1, n
            DO 10 i = 1, min( j, m )
               b( i, j ) = a( i, j )
   10       CONTINUE
   20    CONTINUE
      ELSE IF( lsame( uplo, 'L' ) ) THEN
         DO 40 j = 1, n
            DO 30 i = j, m
               b( i, j ) = a( i, j )
   30       CONTINUE
   40    CONTINUE
      ELSE
         DO 60 j = 1, n
            DO 50 i = 1, m
               b( i, j ) = a( i, j )
   50       CONTINUE
   60    CONTINUE
      END IF
      RETURN
*
*     End of DLACPY
*
      END

The next code snippet shows the same routine, but implemented using the free form layout introduced in Fortran 90.

subroutine dlacpy( uplo, M, N, A, LDA, B, LDB )
    implicit none
    ! ..
    ! Scalar arguments:
    character :: uplo
    integer :: LDA, LDB, M, N
    ! ..
    ! Array arguments:
    double precision :: A( LDA, * ), B( LDB, * )
    ! ..
    ! Local scalars:
    integer :: i, j
    ! ..
    ! External functions:
    logical LSAME
    external lsame
    ! ..
    ! Intrinsic functions:
    intrinsic min
    ! ..
    if ( lsame( uplo, 'U' ) ) then
        do j = 1, n
            do i = 1, min( j, m )
               b( i, j ) = a( i, j )
            end do
        end do
    else if( lsame( uplo, 'L' ) ) then
        do j = 1, n
            do i = j, m
               b( i, j ) = a( i, j )
            end do
        end do
    else
        do j = 1, n
            do i = 1, m
               b( i, j ) = a( i, j )
            end do
        end do
    end if
    return
end subroutine dlacpy

As may be observed, by removing column restrictions and moving away from the F77 convention of writing specifiers in ALL CAPS, modern Fortran code is more visibly consistent and thus more readable.

Labeled control structures

Another common practice in LAPACK routines is the use of labeled control structures. For example, consider the following code snippet in which the label 10 must match a corresponding CONTINUE.

      DO 10 I = 1, 10
          PRINT *, I
   10 CONTINUE

Fortran 90 obviated the need for this practice and improved code readability by allowing one to use end do to end a do loop. This change is shown in the free form version of dlacpy provided above.

Assumed-size arrays

To allow flexibility in handling arrays of varying sizes, LAPACK routines commonly operate on arrays having an assumed-size. In the dlacpy routine above, the input matrix A is declared to be a two-dimensional array having an assumed-size according to the expression A(LDA, *). This expression declares that A has LDA number of rows and uses * as a placeholder to indicate that the size of the second dimension is determined by the calling program.

One consequence of using assumed-size arrays is that compilers are unable to perform bounds checking on the unspecified dimension. Thus, current best practice is to use explicit interfaces and assumed-shape arrays (e.g., A(LDA,:)) in order to prevent out-of-bounds memory access. This stated, the use of assumed-shape arrays can be problematic when needing to pass sub-matrices to other functions, as doing so requires slicing which often results in compilers creating internal copies of array data.

Migrating to Fortran 95

Needless to say, it took me a while to adjust to LAPACK conventions and adopt a LAPACK mindset. However, being something of a purist, if I was going to be porting over routines anyway, I at least wanted to bring those routines I did manage to port into a more modern age in hopes of improving code readability and future maintenance. So, after discussing things with stdlib maintainers, I settled on migrating routines to Fortran 95, which, while not the latest and greatest Fortran version, seemed to strike the right balance between maintaining the look-and-feel of the original implementations, ensuring (good enough) backward compatibility, and taking advantage of newer syntactical features.

Test Coverage

One of the problems with pursuing a bottom-up approach to adding LAPACK support is that explicit unit tests for lower-level utility routines are often non-existent in LAPACK. LAPACK's test suite largely employs a hierarchical testing philosophy in which testing higher-level routines is assumed to ensure that their dependent lower-level routines are functioning correctly as part of an overall workflow. While one can argue that focusing on integration testing over unit testing for lower-level routines is reasonable, as adding tests for every routine could potentially increase the maintenance burden and complexity of LAPACK's testing framework, it means that we couldn't readily rely on prior art for unit testing and would have to come up with comprehensive standalone unit tests for each lower-level routine on our own.

Documentation

Along a similar vein to test coverage, outside of LAPACK itself, finding real-world documented examples showcasing the use of lower-level routines was challenging. While LAPACK routines are consistently preceded by a documentation comment providing descriptions of input arguments and possible return values, without code examples, visualizing and grokking expected input and output values can be challenging, especially when dealing with specialized matrices. And while neither the absence of unit tests nor documented examples is the end of the world, it meant that adding LAPACK support to stdlib would be more of a slog than I expected. Writing benchmarks, tests, examples, and documentation was simply going to require more time and effort, potentially limiting the number of routines I could implement during the internship.

Memory layouts

When storing matrix elements in linear memory, one has two choices: either store columns contiguously or rows contiguously (see Figure 2). The former memory layout is referred to as column-major order and the latter as row-major order.

Figure 2: Schematic demonstrating storing matrix elements in linear memory in either (a) column-major (Fortran-style) or (b) row-major (C-style) order. The choice of which layout to use is largely a matter of convention.

The choice of which layout to use is largely a matter of convention. For example, Fortran stores elements in column-major order, and C stores elements in row-major order. Higher-level libraries, such as NumPy and stdlib, support both column- and row-major orders, allowing you to configure the layout of a multi-dimensional array during array creation.

import asarray from '@stdlib/ndarray-array';

// Create a row-major array:
const x = asarray([1.0, 2.0, 3.0, 4.0], {
    'shape': [2, 2],
    'order': 'row-major'
});

// Create a column-major array:
const y = asarray([1.0, 3.0, 2.0, 4.0], {
    'shape': [2, 2],
    'order': 'column-major'
});

While neither memory layout is inherently better than the other, arranging data to ensure sequential access in accordance with the conventions of the underlying storage model is critical in ensuring optimal performance. Modern CPUs are able to process sequential data more efficiently than non-sequential data, which is primarily due to CPU caching which, in turn, exploits spatial locality of reference.

To demonstrate the performance impact of sequential vs non-sequential element access, consider the following function which copies all the elements from an MxN matrix A to another MxN matrix B and which does so assuming that matrix elements are stored in column-major order.

/**
* Copies elements from `A` to `B`.
*
* @param {integer} M - number of rows
* @param {integer} N - number of columns
* @param {Array} A - source matrix
* @param {integer} strideA1 - index increment to move to the next element in a column
* @param {integer} strideA2 - index increment to move to the next element in a row
* @param {integer} offsetA - index of the first indexed element in `A`
* @param {Array} B - source matrix
* @param {integer} strideB1 - index increment to move to the next element in a column
* @param {integer} strideB2 - index increment to move to the next element in a row
* @param {integer} offsetB - index of the first indexed element in `B`
*/
function copy(M, N, A, strideA1, strideA2, offsetA, B, strideB1, strideB2, offsetB) {
    // Initialize loop bounds:
    const S0 = M;
    const S1 = N;

    // For column-major matrices, the first dimension has the fastest changing index.
    // Compute "pointer" increments accordingly:
    const da0 = strideA1;                  // pointer increment for innermost loop
    const da1 = strideA2 - (S0*strideA1);  // pointer increment for outermost loop
    const db0 = strideB1;
    const db1 = strideB2 - (S0*strideB1);

    // Initialize "pointers" to the first indexed elements in the respective arrays:
    let ia = offsetA;
    let ib = offsetB;

    // Iterate over matrix dimensions:
    for (let i1 = 0; i1 < S1; i1++) {
        for (let i0 = 0; i0 < S0; i0++) {
            B[ib] = A[ia];
            ia += da0;
            ib += db0;
        }
        ia += da1;
        ib += db1;
    }
}

Let A and B be the following 3x2 matrices:

\begin{bmatrix}1 & 2 \\3 & 4 \\5 & 6\end{bmatrix},\ B = \begin{bmatrix}0 & 0 \\0 & 0 \\0 & 0\end{bmatrix}

When both A and B are stored in column-major order, we can call the copy routine as follows:

const A = [1, 3, 5, 2, 4, 6];
const B = [0, 0, 0, 0, 0, 0];

copy(3, 2, A, 1, 3, 0, B, 1, 3, 0);

If, however, A and B are both stored in row-major order, the call signature changes to

const A = [1, 2, 3, 4, 5, 6];
const B = [0, 0, 0, 0, 0, 0];

copy(3, 2, A, 2, 1, 0, B, 2, 1, 0);

Notice that, in the latter scenario, we fail to access elements in sequential order within the innermost loop, as da0 is 2 and da1 is -5 and similarly for db0 and db1. Instead, the array index "pointers" repeatedly skip ahead before returning to earlier elements in linear memory, with ia = {0, 2, 4, 1, 3, 5} and ib the same. In Figure 3, we show the performance impact of non-sequential access.

Figure 3: Performance comparison when providing square column-major versus row-major matrices to copy when copy assumes sequential element access according to column-major order. The x-axis enumerates increasing matrix sizes (i.e., number of elements). All rates are normalized relative to column-major results for a corresponding matrix size.

From the figure, we may observe that column- and row-major performance is roughly equivalent until we operate on square matrices having more than 1e5 elements (M = N = ~316). For 1e6 elements (M = N = ~1000), providing a row-major matrix to copy results in a greater than 25% performance decrease. For 1e7 elements (M = N = ~3160), we observe a greater than 85% performance decrease. The significant performance impact may be attributed to decreased locality of reference when operating on row-major matrices having large row sizes.

Given that it is written in Fortran, LAPACK assumes column-major access order and implements its algorithms accordingly. This presents issues for libraries, such as stdlib, which not only support row-major order, but make it their default memory layout. Were we to simply port LAPACK's Fortran implementations to JavaScript, users providing row-major matrices would experience adverse performance impacts stemming from non-sequential access.

To mitigate adverse performance impacts, we borrowed an idea from BLIS, a BLAS-like library supporting both row- and column-major memory layouts in BLAS routines, and decided to create modified LAPACK implementations when porting routines from Fortran to JavaScript and C that explicitly accommodate both column- and row-major memory layouts through separate stride parameters for each dimension. For some implementations, such as dlacpy, which is similar to the copy function defined above, incorporating separate and independent strides is straightforward, often involving stride tricks and loop interchange, but, for others, the modifications turned out to be much less straightforward due to specialized matrix handling, varying access patterns, and combinatorial parameterization.

ndarrays

LAPACK routines primarily operate on matrices stored in linear memory and whose elements are accessed according to specified dimensions and the stride of the leading (i.e., first) dimension. Dimensions specify the number of elements in each row and column, respectively. The stride specifies how many elements in linear memory must be skipped in order to access the next element of a row. LAPACK assumes that elements belonging to the same column are always contiguous (i.e., adjacent in linear memory). Figure 4 provides a visual representation of LAPACK conventions (specifically, schematics (a) and (b)).

Figure 4: Schematics illustrating the generalization of LAPACK strided array conventions to non-contiguous strided arrays. a) A 5-by-5 contiguous matrix stored in column-major order. b) A 3-by-3 non-contiguous sub-matrix stored in column-major order. Sub-matrices can be operated on in LAPACK by providing a pointer to the first indexed element and specifying the stride of the leading (i.e., first) dimension. In this case, the stride of leading dimension is five, even though there are only three elements per column, due to the non-contiguity of sub-matrix elements in linear memory when stored as part of a larger matrix. In LAPACK, the stride of the trailing (i.e., second) dimension is always assumed to be unity. c) A 3-by-3 non-contiguous sub-matrix stored in column-major order having non-unit strides and generalizing LAPACK stride conventions to both leading and trailing dimensions. This generalization underpins stdlib's multi-dimensional arrays (also referred to as "ndarrays").

Libraries, such as NumPy and stdlib, generalize LAPACK's strided array conventions to support

non-unit strides in the last dimension (see Figure 4 (c)). LAPACK assumes that the last dimension of a matrix always has unit stride (i.e., elements within a column are stored contiguously in linear memory).
negative strides for any dimension. LAPACK requires that the stride of a leading matrix dimension be positive.
multi-dimensional arrays having more than two dimensions. LAPACK only explicitly supports strided vectors and (sub)matrices.

Support for non-unit strides in the last dimension ensures support for O(1) creation of non-contiguous views of linear memory without requiring explicit data movement. These views are often called "slices". As an example, consider the following code snippet which creates such views using APIs provided by stdlib.

import linspace from '@stdlib/array-linspace'
import FancyArray from '@stdlib/ndarray-fancy';

// Define a two-dimensional array similar to that shown in Figure 4 (a):
const x = new FancyArray('float64', linspace(0, 24, 25), [5, 5], [5, 1], 0, 'row-major');
// returns <FancyArray>

// Create a sub-matrix view similar to that shown in Figure 4 (b):
const v1 = x['1:4,:3'];
// returns <FancyArray>

// Create a sub-matrix view similar to that shown in Figure 4 (c):
const v2 = x['::2,::2'];
// returns <FancyArray>

// Assert that all arrays share the same underlying memory buffer:
const b1 = (v1.data.buffer === x.data.buffer);
// returns true

const b2 = (v2.data.buffer === x.data.buffer);
// returns true

Without support for non-unit strides in the last dimension, returning a view from the expression x['::2,::2'] would not be possible, as one would need to copy selected elements to a new linear memory buffer in order to ensure contiguity.

Figure 5: Schematics illustrating the use of stride manipulation to create flipped and rotated views of matrix elements stored in linear memory. For all sub-schematics, strides are listed as [trailing_dimension, leading_dimension]. Implicit for each schematic is an "offset", which indicates the index of the first indexed element such that, for a matrix A, the element Aij is resolved according to i⋅strides[1] + j⋅strides[0] + offset. a) Given a 3-by-3 matrix stored in column-major order, one can manipulate the strides of the leading and trailing dimensions to create views in which matrix elements along one or more axes are accessed in reverse order. b) Using similar stride manipulation, one can create rotated views of matrix elements relative to their arrangement within linear memory.

Support for negative strides enables O(1) reversal and rotation of elements along one or more dimensions (see Figure 5). For example, to flip a matrix top-to-bottom and left-to-right, one need only negate the strides. Building on the previous code snippet, the following code snippet demonstrates reversing elements about one or more axes.

import linspace from '@stdlib/array-linspace'
import FancyArray from '@stdlib/ndarray-fancy';

// Define a two-dimensional array similar to that shown in Figure 5 (a):
const x = new FancyArray('float64', linspace(0, 8, 9), [3, 3], [3, 1], 0, 'row-major');

// Reverse elements along each row:
const v1 = x['::-1,:'];

// Reverse elements along each column:
const v2 = x[':,::-1'];

// Reverse elements along both columns and rows:
const v3 = x['::-1,::-1'];

// Assert that all arrays share the same underlying memory buffer:
const b1 = (v1.data.buffer === x.data.buffer);
// returns true

const b2 = (v2.data.buffer === x.data.buffer);
// returns true

const b3 = (v3.data.buffer === x.data.buffer);
// returns true

Implicit in the discussion of negative strides is the need for an "offset" parameter which indicates the index of the first indexed element in linear memory. For a strided multi-dimensional array A and a list of strides s, the index corresponding to element Aij⋅⋅⋅n can be resolved according to the equation

idx=offset+i⋅s0+j⋅s1+…+n⋅sN−1\textrm{idx} = \textrm{offset} + i \cdot s_0 + j \cdot s_1 + \ldots + n \cdot s_{N-1}

where N is the number of array dimensions and sk corresponds to kth stride.

In BLAS and LAPACK routines supporting negative strides—something which is only supported when operating on strided vectors (e.g., see daxpy above)—the index offset is computed using logic similar to the following code snippet:

if (stride < 0) {
    offset = (1-M) * stride;
} else {
    offset = 0;
}

where M is the number of vector elements. This implicitly assumes that a provided data pointer points to the beginning of linear memory for a vector. In languages supporting pointers, such as C, in order to operate on a different region of linear memory, one typically adjusts a pointer using pointer arithmetic prior to function invocation, which is relatively cheap and straightforward, at least for the one-dimensional case.

For example, returning to c_daxpy as defined above, we can use pointer arithmetic to limit element access to five elements within linear memory beginning at the eleventh and sixteenth elements (note: zero-based indexing) of an input and output array, respectively, as shown in the following code snippet.

// Define data arrays:
const double X[] = {...};
double Y[] = {...};

// Specify the indices of the elements which begin a desired memory region:
const xoffset = 10;
const yoffset = 15; 

// Limit the operation to only elements within the desired memory region:
c_daxpy(5, 5.0, X+xoffset, 1, Y+yoffset, 1);

However, in JavaScript, which does not support explicit pointer arithmetic for binary buffers, one must explicitly instantiate new typed array objects having a desired byte offset. In the following code snippet, in order to achieve the same results as the C example above, we must resolve a typed array constructor, compute a new byte offset, compute a new typed array length, and create a new typed array instance.

/**
* Returns a typed array view having the same data type as a provided input typed
* array and starting at a specified index offset.
*
* @param {TypedArray} x - input array
* @param {integer} offset - starting index
* @returns {TypedArray} typed array view
*/
function offsetView(x, offset) {
    return new x.constructor(x.buffer, x.byteOffset+(x.BYTES_PER_ELEMENT*offset), x.length-offset);
}

// ...

const x = new Float64Array([...]);
const y = new Float64Array([...]);

// ...

daxpy(5, 5.0, offsetView(x, 10), 1, offsetView(y, 15), 1);

For large array sizes, the cost of typed array instantiation is negligible compared to the time spent accessing and operating on individual array elements; however, for smaller array sizes, object instantiation can significantly impact performance.

Accordingly, in order to avoid adverse object instantiation performance impacts, stdlib decouples an ndarray's data buffer from the location of the buffer element corresponding to the beginning of an ndarray view. This allows the slice expressions x[2:,3:] and x[3:,1:] to return new ndarray views without needing to instantiate new buffer instances, as demonstrated in the following code snippet.

import linspace from '@stdlib/array-linspace'
import FancyArray from '@stdlib/ndarray-fancy';

const x = new FancyArray('float64', linspace(0, 24, 25), [5, 5], [5, 1], 0, 'row-major');

const v1 = x['2:,3:'];
const v2 = x['3:,1:'];

// Assert that all arrays share the same typed array data instance:
const b1 = (v1.data === x.data);
// returns true

const b2 = (v2.data === x.data);
// returns true

As a consequence of decoupling a data buffer from the beginning of an ndarray view, we similarly sought to avoid having to instantiate new typed array instances when calling into LAPACK routines with ndarray data. This meant creating modified LAPACK API signatures supporting explicit offset parameters for all strided vectors and matrices.

For simplicity, let's return to the JavaScript implementation of daxpy, which was previously defined above.

function daxpy(N, alpha, X, strideX, Y, strideY) {
    let ix;
    let iy;
    let i;
    if (N <= 0) {
        return;
    }
    if (alpha === 0.0) {
        return;
    }
    if (strideX < 0) {
        ix = (1-N) * strideX;
    } else {
        ix = 0;
    }
    if (strideY < 0) {
        iy = (1-N) * strideY;
    } else {
        iy = 0;
    }
    for (i = 0; i < N; i++) {
        Y[iy] += alpha * X[ix];
        ix += strideX;
        iy += strideY;
    }
    return;
}

As demonstrated in the following code snippet, we can modify the above signature and implementation such that the responsibility for resolving the first indexed element is shifted to the API consumer.

function daxpy_ndarray(N, alpha, X, strideX, offsetX, Y, strideY, offsetY) {
    let ix;
    let iy;
    let i;
    if (N <= 0) {
        return;
    }
    if (alpha === 0.0) {
        return;
    }
    ix = offsetX;
    iy = offsetY;
    for (i = 0; i < N; i++) {
        Y[iy] += alpha * X[ix];
        ix += strideX;
        iy += strideY;
    }
    return;
}

For ndarrays, resolution happens during ndarray instantiation, making the invocation of daxpy_ndarray with ndarray data a straightforward passing of associated ndarray meta data. This is demonstrated in the following code snippet.

import linspace from '@stdlib/array-linspace'
import FancyArray from '@stdlib/ndarray-fancy';

// Create two ndarrays:
const x = new FancyArray('float64', linspace(0, 24, 25), [5, 5], [5, 1], 0, 'row-major');
const y = new FancyArray('float64', linspace(0, 24, 25), [5, 5], [5, 1], 0, 'row-major');

// Create a view of `x` corresponding to every other element in the 3rd row:
const v1 = x['2,1::2'];

// Create a view of `y` corresponding to every other element in the 3rd column:
const v2 = y['1::2,2'];

// Operate on the vectors:
daxpy_ndarray(v1.length, 5.0, v1.data, v1.strides[0], v1.offset, v2.data, v2.strides[0], v2.offset);

Similar to BLIS, we saw value in both conventional LAPACK API signatures (e.g., for backward compatibility) and modified API signatures (e.g., for minimizing adverse performance impacts), and thus, we settled on a plan to provide both conventional and modified APIs for each LAPACK routine. To minimize code duplication, we aimed to implement a common lower-level "base" implementation which could then be wrapped by higher-level APIs. While the changes for the BLAS routine daxpy shown above may appear relatively straightforward, the transformation of a conventional LAPACK routine and its expected behavior to a generalized implementation was often much less so.

dlaswp

Enough with the challenges! What does a final product look like?!

Let's come full circle and bring this back to dlaswp, a LAPACK routine for performing a series of row interchanges on an input matrix according to a list of pivot indices. The following code snippet shows the reference LAPACK Fortran implementation.

SUBROUTINE dlaswp( N, A, LDA, K1, K2, IPIV, INCX )
*
*  -- LAPACK auxiliary routine --
*  -- LAPACK is a software package provided by Univ. of Tennessee,    --
*  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
*
*     .. Scalar Arguments ..
      INTEGER            INCX, K1, K2, LDA, N
*     ..
*     .. Array Arguments ..
      INTEGER            IPIV( * )
      DOUBLE PRECISION   A( LDA, * )
*     ..
*
* =====================================================================
*
*     .. Local Scalars ..
      INTEGER            I, I1, I2, INC, IP, IX, IX0, J, K, N32
      DOUBLE PRECISION   TEMP
*     ..
*     .. Executable Statements ..
*
*     Interchange row I with row IPIV(K1+(I-K1)*abs(INCX)) for each of rows
*     K1 through K2.
*
      IF( incx.GT.0 ) THEN
         ix0 = k1
         i1 = k1
         i2 = k2
         inc = 1
      ELSE IF( incx.LT.0 ) THEN
         ix0 = k1 + ( k1-k2 )*incx
         i1 = k2
         i2 = k1
         inc = -1
      ELSE
         RETURN
      END IF
*
      n32 = ( n / 32 )*32
      IF( n32.NE.0 ) THEN
         DO 30 j = 1, n32, 32
            ix = ix0
            DO 20 i = i1, i2, inc
               ip = ipiv( ix )
               IF( ip.NE.i ) THEN
                  DO 10 k = j, j + 31
                     temp = a( i, k )
                     a( i, k ) = a( ip, k )
                     a( ip, k ) = temp
   10             CONTINUE
               END IF
               ix = ix + incx
   20       CONTINUE
   30    CONTINUE
      END IF
      IF( n32.NE.n ) THEN
         n32 = n32 + 1
         ix = ix0
         DO 50 i = i1, i2, inc
            ip = ipiv( ix )
            IF( ip.NE.i ) THEN
               DO 40 k = n32, n
                  temp = a( i, k )
                  a( i, k ) = a( ip, k )
                  a( ip, k ) = temp
   40          CONTINUE
            END IF
            ix = ix + incx
   50    CONTINUE
      END IF
*
      RETURN
*
*     End of DLASWP
*
      END

To facilitate interfacing with the Fortran implementation from C, LAPACK provides a two-level C interface called LAPACKE, which wraps Fortran implementations and makes accommodations for both row- and column-major input and output matrices. The middle-level interface for dlaswp is shown in the following code snippet.

lapack_int LAPACKE_dlaswp_work( int matrix_layout, lapack_int n, double* a,
                                lapack_int lda, lapack_int k1, lapack_int k2,
                                const lapack_int* ipiv, lapack_int incx )
{
    lapack_int info = 0;
    if( matrix_layout == LAPACK_COL_MAJOR ) {
        /* Call LAPACK function and adjust info */
        LAPACK_dlaswp( &n, a, &lda, &k1, &k2, ipiv, &incx );
        if( info < 0 ) {
            info = info - 1;
        }
    } else if( matrix_layout == LAPACK_ROW_MAJOR ) {
        lapack_int lda_t = MAX(1,k2);
        lapack_int i;
        for( i = k1; i <= k2; i++ ) {
            lda_t = MAX( lda_t, ipiv[k1 + ( i - k1 ) * ABS( incx ) - 1] );
        }
        double* a_t = NULL;
        /* Check leading dimension(s) */
        if( lda < n ) {
            info = -4;
            LAPACKE_xerbla( "LAPACKE_dlaswp_work", info );
            return info;
        }
        /* Allocate memory for temporary array(s) */
        a_t = (double*)LAPACKE_malloc( sizeof(double) * lda_t * MAX(1,n) );
        if( a_t == NULL ) {
            info = LAPACK_TRANSPOSE_MEMORY_ERROR;
            goto exit_level_0;
        }
        /* Transpose input matrices */
        LAPACKE_dge_trans( matrix_layout, lda_t, n, a, lda, a_t, lda_t );
        /* Call LAPACK function and adjust info */
        LAPACK_dlaswp( &n, a_t, &lda_t, &k1, &k2, ipiv, &incx );
        info = 0;  /* LAPACK call is ok! */
        /* Transpose output matrices */
        LAPACKE_dge_trans( LAPACK_COL_MAJOR, lda_t, n, a_t, lda_t, a, lda );
        /* Release memory and exit */
        LAPACKE_free( a_t );
exit_level_0:
        if( info == LAPACK_TRANSPOSE_MEMORY_ERROR ) {
            LAPACKE_xerbla( "LAPACKE_dlaswp_work", info );
        }
    } else {
        info = -1;
        LAPACKE_xerbla( "LAPACKE_dlaswp_work", info );
    }
    return info;
}

When called with a column-major matrix a, the wrapper LAPACKE_dlaswp_work simply passes along provided arguments to the Fortran implementation. However, when called with a row-major matrix a, the wrapper must allocate memory, explicitly transpose and copy a to a temporary matrix a_t, recompute the stride of the leading dimension, invoke dlaswp with a_t, transpose and copy the results stored in a_t to a, and finally free allocated memory. That is a fair amount of work and is common across most LAPACK routines.

The following code snippet shows the reference LAPACK implementation ported to JavaScript, with support for leading and trailing dimension strides, index offsets, and a strided vector containing pivot indices.

// File: base.js

// ...

const BLOCK_SIZE = 32;

// ...

function base(N, A, strideA1, strideA2, offsetA, k1, k2, inck, IPIV, strideIPIV, offsetIPIV) {
    let nrows;
    let n32;
    let tmp;
    let row;
    let ia1;
    let ia2;
    let ip;
    let o;

    // Compute the number of rows to be interchanged:
    if (inck > 0) {
        nrows = k2 - k1;
    } else {
        nrows = k1 - k2;
    }
    nrows += 1;

    // If the order is row-major, we can delegate to the Level 1 routine `dswap` for interchanging rows...
    if (isRowMajor([strideA1, strideA2])) {
        ip = offsetIPIV;
        for (let i = 0, k = k1; i < nrows; i++, k += inck) {
            row = IPIV[ip];
            if (row !== k) {
                dswap(N, A, strideA2, offsetA+(k*strideA1), A, strideA2, offsetA+(row*strideA1));
            }
            ip += strideIPIV;
        }
        return A;
    }
    // If the order is column-major, we need to use loop tiling to ensure efficient cache access when accessing matrix elements...
    n32 = floor(N/BLOCK_SIZE) * BLOCK_SIZE;
    if (n32 !== 0) {
        for (let j = 0; j < n32; j += BLOCK_SIZE) {
            ip = offsetIPIV;
            for (let i = 0, k = k1; i < nrows; i++, k += inck) {
                row = IPIV[ip];
                if (row !== k) {
                    ia1 = offsetA + (k*strideA1);
                    ia2 = offsetA + (row*strideA1);
                    for (let n = j; n < j+BLOCK_SIZE; n++) {
                        o = n * strideA2;
                        tmp = A[ia1+o];
                        A[ia1+o] = A[ia2+o];
                        A[ia2+o] = tmp;
                    }
                }
                ip += strideIPIV;
            }
        }
    }
    if (n32 !== N) {
        ip = offsetIPIV;
        for (let i = 0, k = k1; i < nrows; i++, k += inck) {
            row = IPIV[ ip ];
            if (row !== k) {
                ia1 = offsetA + (k*strideA1);
                ia2 = offsetA + (row*strideA1);
                for (let n = n32; n < N; n++) {
                    o = n * strideA2;
                    tmp = A[ia1+o];
                    A[ia1+o] = A[ia2+o];
                    A[ia2+o] = tmp;
                }
            }
            ip += strideIPIV;
        }
    }
    return A;
}

To provide an API having consistent behavior with conventional LAPACK, I then wrapped the above implementation and adapted input arguments to the "base" implementation, as shown in the following code snippet.

// File: dlaswp.js

// ...
const base = require('./base.js');

// ...

function dlaswp(order, N, A, LDA, k1, k2, IPIV, incx) {
    let tmp;
    let inc;
    let sa1;
    let sa2;
    let io;
    if (!isLayout(order)) {
        throw new TypeError(format('invalid argument. First argument must be a valid order. Value: `%s`.', order));
    }
    if (order === 'row-major' && LDA < max(1, N)) {
        throw new RangeError(format('invalid argument. Fourth argument must be greater than or equal to max(1,%d). Value: `%d`.', N, LDA));
    }
    if (incx > 0) {
        inc = 1;
        io = k1;
    } else if (incx < 0) {
        inc = -1;
        io = k1 + ((k1-k2) * incx);
        tmp = k1;
        k1 = k2;
        k2 = tmp;
    } else {
        return A;
    }
    if (order === 'column-major') {
        sa1 = 1;
        sa2 = LDA;
    } else { // order === 'row-major'
        sa1 = LDA;
        sa2 = 1;
    }
    return base(N, A, sa1, sa2, 0, k1, k2, inc, IPIV, incx, io);
}

I subsequently wrote a separate but similar wrapper which provides an API mapping more directly to stdlib's multi-dimensional arrays and which performs some special handling when the direction in which to apply pivots is negative, as shown in the following code snippet.

// File: ndarray.js

const base = require('./base.js');

// ...

function dlaswp_ndarray(N, A, strideA1, strideA2, offsetA, k1, k2, inck, IPIV, strideIPIV, offsetIPIV) {
    let tmp;
    if (inck < 0) {
        offsetIPIV += k2 * strideIPIV;
        strideIPIV *= -1;
        tmp = k1;
        k1 = k2;
        k2 = tmp;
        inck = -1;
    } else {
        offsetIPIV += k1 * strideIPIV;
        inck = 1;
    }
    return base(N, A, strideA1, strideA2, offsetA, k1, k2, inck, IPIV, strideIPIV, offsetIPIV);
}

A few points to note:

In contrast to the conventional LAPACKE API, the matrix_layout (order) parameter is not necessary in the dlaswp_ndarray and base APIs, as the order can be inferred from the provided strides.
In contrast to the conventional LAPACKE API, when an input matrix is row-major, we don't need to copy data to temporary workspace arrays, thus reducing unnecessary memory allocation.
In contrast to libraries, such as NumPy and SciPy, which interface with BLAS and LAPACK directly, when calling LAPACK routines in stdlib, we don't need to copy non-contiguous multi-dimensional data to and from temporary workspace arrays before and after invocation, respectively. Except when interfacing with hardware-optimized BLAS and LAPACK, the pursued approach helps minimize data movement and ensures performance in resource constrained browser applications.

For server-side applications hoping to leverage hardware-optimized libraries, such as OpenBLAS, we provide separate wrappers which adapt generalized signature arguments to their optimized API equivalents. In this context, at least for sufficiently large arrays, creating temporary copies can be worth the overhead.

Current status and next steps

Despite the challenges, unforeseen setbacks, and multiple design iterations, I am happy to report that, in addition to dlaswp above, I was able to open 35 PRs adding support for various LAPACK routines and associated utilities. Obviously not quite 1,700 routines, but a good start! :)

Nevertheless, the future is bright, and we are quite excited about this work. There's still plenty of room for improvement and additional research and development. In particular, we're keen to

explore tooling and automation.
address build issues when resolving the source files of Fortran dependencies spread across multiple stdlib packages.
roll out C and Fortran implementations and native bindings for stdlib's existing LAPACK packages.
continue growing stdlib's library of modular LAPACK routines.
identify additional areas for performance optimization.

While my Quansight Labs internship has ended, my plan is to continue adding packages and pushing this effort along. Given the immense potential and LAPACK's fundamental importance, we'd love to see this initiative of bringing LAPACK to the web continue to grow, so, if you are interested in helping drive this forward, please don't hesitate to reach out! And if you are interested in sponsoring development, the folks at Quansight would be more than happy to chat.

And with that, I would like to thank Quansight for providing this internship opportunity. I feel incredibly fortunate to have learned so much. Being an intern at Quansight was long a dream of mine, and I am very grateful to have fulfilled it. I want to extend a special thanks to Athan Reines and to Melissa Mendonça, who is an amazing mentor and all around wonderful person! And thank you to all the stdlib core developers and everyone else at Quansight for helping me out in ways both big and small along the way.

Cheers!

Reflecting on GSoC 2024

Philipp Burckhardt — Fri, 04 Oct 2024 03:14:52 +0000

Achievements, Lessons, and Tips for Future Success

An exciting summer has come to a close for stdlib with our first participation in Google Summer of Code (GSoC). GSoC is an annual program run by Google and a highlight within the open source community. It brings together passionate contributors and mentors to collaborate on open source projects. Selected contributors receive a stipend for their hard work, while organizations benefit from new features, improved project visibility, and the potential to cultivate long-term contributors.

stdlib (/ˈstændərd lɪb/ "standard lib") is a fundamental numerical library for JavaScript. Our mission is to create a scientific computing ecosystem for JavaScript and TypeScript, similar to what NumPy and SciPy are for Python. This year, we were granted four slots in GSoC, marking a significant milestone for us as a first-time participating organization.

The purpose of this post is to share our GSoC experiences to help future organizations and contributors prepare more effectively. We aim to provide insights into what worked well, what challenges we faced, and advice for making the most out of this incredible program.

Highlights of the Program

While we certainly encountered bumps along the way (more on that in a second), overall, our participation in GSoC was packed with standout moments. Our accepted contributors successfully completed their four GSoC projects.

To illustrate the impact of our participation, here are some key statistics and accomplishments from our community since the GSoC organization announcement in February:

Over 1,000 PRs opened
More than 100 unique PR contributors
Over 2,000 new commits to the codebase

We had a range of successful contributions that significantly advanced stdlib. Specifically, our four GSoC contributors worked on the following projects:

Aman Bhansali worked on BLAS bindings, overcoming the challenge of integrating complex numerical libraries into JavaScript.
Gunj Joshi developed C implementations for special mathematical functions, significantly improving the performance of our library.
Jaysukh Makvana added support for Boolean arrays, enhancing the library's functionality and usability and paving the way for NumPy-like array indexing in JavaScript.
Snehil Shah worked on enhancing the stdlib REPL for scientific computing in Node.js, making it easier for users to interact with our library and perform data analysis in their terminals.

Each project addressed critical areas in our mission to create a comprehensive numerical library for JavaScript and the web platform.

Finally, we already see a glimpse of the project attracting long-term contributors from both GSoC participants and the broader community.

An Unexpected Challenge

Despite the many positives, our journey wasn't without its share of challenges. Early on, we faced an unexpected incident that seemed straight out of a movie plot. A prospective contributor tried to sabotage a fellow applicant by impersonating them on Gitter, the open source instant messaging and chat room service where we engage with the community. After signing up via a fake Twitter/X account, he started sending unhinged messages to several of the project's core contributors. While it quickly became clear that we were communicating with an impersonator, it was an unsettling experience nonetheless. The impersonator even ended up copying the real applicant's proposal and later attempted to claim the work as their own on GitHub after the conclusion of GSoC.

In light of this experience, we advise any organizations participating in GSoC to keep in mind that competition for slots can be fierce, and that some individuals may be tempted to use subterfuge or actively jeopardize others' applications. One must be vigilant and expect the unexpected. We also recommend having a Code of Conduct (CoC) in place to address such unethical behavior and raising awareness among GSoC contributors of its existence, such as having a CoC acknowledgment checkbox on pull requests and when submitting proposals.

Lessons Learned and Advice for Future Participants

Engage Early with the Community

First and foremost, it is crucial to encourage potential contributors to start interacting with the community and codebase well before the application period. This helps build familiarity and commitment. Although we were aware of this, we could have done more to encourage early engagement and provide clearer guidance on how to get started. Going through all onboarding steps afresh may help uncover outdated information in documentation or other inconsistencies.

💡 Community Outreach: Actively promote your participation through social media, blogs, and coding forums. Use platforms like X/Twitter, LinkedIn, and relevant forums to announce your participation and engage with potential contributors.

Handling Community Queries

After our participation was announced, we were quickly bombarded with what seemed like a non-stop barrage of messages per day on Gitter and other communication channels, and with dozens of PRs opened each day. As the core stdlib team is not working on the project full-time, it was very challenging to keep up. We learned that it's essential to set clear expectations and boundaries early on to manage the influx of new contributors.

Managing the Onboarding Process

Answering the same questions repeatedly can be time-consuming, so having frequently asked questions (FAQs) and a well-documented onboarding process will prove to be invaluable. We also started a weekly office hour for people to drop by. This had a decent turnout and proved valuable, as only individuals who were genuinely interested in the project attended and helped weed out those who were just making "drive-by" contributions. In addition to the weekly office hours, we also held two sessions during the application period to serve as informational sessions specifically focusing on GSoC so we could answer all questions that prospective contributors had.

After the conclusion of GSoC, we have continued to hold weekly office hours, which have been a great way to keep the community engaged!

💡 Communication Channels: Clearly outline the primary communication channels (e.g., mailing lists, chat platforms like Gitter, etc) and how to use them.

Good First Issues

What worked less well were the "good first issues" issues we had opened and labeled as such on GitHub. We found that issues we thought were good first ones, such as updating documentation and examples, resulted in a very high number of low-quality submissions, often suffering from hallucinated contents due to AI generation or other issues, which caused more work for reviewers. On the other hand, other tasks, such as refining TypeScript definitions, were often too complex and challenging for newcomers.

We learned that the best first issues are those that are well-scoped, have clear instructions, and are easy to test and verify. Having a bunch of trivial issues provides weak signal; you want to see contributors progressively tackle more complicated tasks as they become more acquainted with the project. To aid in this progression, one would be well served to have enough issues of varying difficulty that prospective contributors can tackle. If possible, it may be ideal to have issues build on top of each other and take the contributor on a journey toward mastery. Similarly, it may be good to create open issues that are related to each of the potential GSoC project topics, so that contributors can get familiar with the parts of the codebase they would be working on during the GSoC program. And lastly, consider creating issue templates specifically for GSoC participants, which include detailed instructions, links to relevant documentation, and expected outcomes. This reduces ambiguity and helps set clear expectations for newcomers.

Going forward, we plan to focus on creating well-defined, incremental issues that serve as stepping stones for new contributors to build familiarity and gradually take on more complex tasks.

💡 Starter Issues and Mini-Projects: Offer beginner-friendly issues and smaller tasks early on to help newcomers familiarize themselves with the codebase. Fixing existing bugs or writing tests can be a good starting point.

The Role of AI

I think it's fair to conclude that Generative AI has emerged as both a blessing and a curse in the world of open source contributions. Personally, I am an avid user of LLMs and happy about the innovation they have sparked in the developer tooling space. They can assist non-native English speakers in better communicating their ideas, provide a conversation partner equipped with vast knowledge of even quite remote topics, and can increase developer productivity through code completions and code generation. However, AI has also led to a flood of low-quality PRs generated by AI tools, often filled with hallucinated code or content that doesn't align with the project's actual requirements. While writing code can feel more rewarding than the often tedious task of reviewing it—especially when the code isn't your own—reviewer fatigue becomes a real issue when faced with a barrage of poorly constructed or misaligned PRs.

Contributors must recognize that AI is an assistant, not a replacement for personal responsibility and craftsmanship. We have by now spent a significant amount of effort in automation to filter out low-effort submissions before they even reach the review stage. Beside workflows that close PRs which don't adhere to basic contribution conventions, we have added jobs that post helpful comments on how to set up a development environment or which remind contributors that they have to accept the project's contribution guidelines before their PR can be reviewed. This significantly reduces the burden on reviewers and ensures contributors are aware of expectations from the beginning.

Contributor Triage

Another important takeaway is to watch out for contributors claiming multiple issues without completing them. We found that it's best to avoid assigning issues to anyone via the respective GitHub feature and instead focus on encouraging quality contributions over sheer quantity. Additionally, be prepared to manage contributors who may place unrealistic demands on review times, such as insisting on immediate feedback.

One has to be ruthless in prioritizing contributions. This approach ensures that contributors who show genuine interest and effort receive the attention they deserve, leading to higher quality interactions and outcomes for both the project and the contributor. Reviewer time is a limited resource, and it's simply not feasible to provide equal attention to every contributor.

At the end of the day, contributors must invest the time necessary to familiarize themselves with a project's conventions, guidelines, and best practices. If they don't meet this minimum threshold and do not show genuine effort, it's not worth allocating the finite resources of the core team. This may sound harsh, but it's necessary to ensure there is enough time to focus on the high-quality contributions. Otherwise, one ends up in a position where everybody is unhappy with your responsiveness. This may be less of an issue for organizations in niches requiring specialized skills and which may not have as wide an audience as a JavaScript library.

Provide Clear Documentation

Ensure that your project documentation is comprehensive and up-to-date. This includes installation guides, contribution guidelines, and a clear roadmap. Poor documentation can be a significant barrier to entry. During the community bonding period, we found that our documentation was outdated in some areas and that there were issues arising from our setup instructions not working on all operating systems. Providing a devcontainer setup for Visual Studio Code helped to mitigate these issues and streamline the onboarding process.

💡 Contribution Guides: Providing detailed guides on setting up the development environment, navigating the codebase, and submitting contributions is crucial.

Mentor Selection and Training

Choose experienced and committed mentors who can provide guidance and support throughout the program. Consider providing mentor training sessions and setting clear expectations around time commitments and responsibilities to better prepare mentors for their roles. Expect mentoring to be more demanding than envisioned.

We found that having weekly stand-ups allowed contributors to get to know each other and share their progress. We had also, early on, decided to have weekly 1:1s between contributors and mentors, combined with active conversations on PRs, RFC issues, and our project-internal Slack. All these channels helped to keep the communication flowing and ensure that everyone was on the same page. However, it's crucial to try to be responsive. Personally, I could have been better at responding to PRs and questions given how quickly the time flies by, with GSoC being over before you know it!

💡 Encourage mentors to actively communicate with each other about their experiences and challenges, so they can offer consistent advice and collaborate on strategies for effectively supporting contributors.

Post-GSoC Engagement Strategies

After GSoC ends, it's essential to keep contributors engaged in order to build a sustainable community. Continue holding regular office hours, offer additional project ideas, or even invite selected GSoC contributors to mentor the next round of participants. This will go a long way toward creating a sense of belonging and long-term commitment.

Common Pitfalls to Avoid

Overwhelming Newcomers: Don't assign tasks that are too complex or lacking adequate documentation.
Inadequate Support: Ensure mentors are available and can provide adequate guidance.
Poor Documentation: Avoid outdated or incomplete documentation which can create barriers to entry.
Insufficient Community Interaction: Foster a sense of community and two-way communication.

To provide an illustrative example of where we fell prey to the pitfalls above, a number of contributors working on Windows machines initially struggled with setting up their local development environment. Because the core stdlib team primarily develops on MacOS and Linux, we are largely unaware of the needs and constraints of Windows users, and our contributing guidelines largely reflected that ignorance. Needless to say, telling people to just use Ubuntu shell was not sufficient. We could have saved ourselves a lot of back and forth by (a) providing preconfigured dev containers, (b) investing the time necessary to create more comprehensive documentation, and (c) having a quick onboarding session over a higher bandwidth medium than chat.

Advice for Contributors

Early Engagement: Interact with the community and start working on beginner-friendly issues early on. If you start contributing before the application period and show your commitment to the project, you will stand out as a proactive candidate during the selection process. This is probably the biggest hack to get selected for GSoC.
Invest in Project Familiarity Early On: Before contributing code, take time to read through old issues, PR discussions, and any architectural documentation available. Understanding the project's historical context can help avoid misunderstandings and improve the relevance of your contributions.
Prioritize Code Quality and Documentation: Don't rush to make as many contributions as possible. Take your time to write high-quality code and back it up with sufficient documentation and test cases. Especially in stdlib, we place a high priority on ensuring consistency throughout the codebase, so the more your contributions look and feel like stdlib, the more likely your contributions will be accepted. This attention to detail will set you apart from others who may focus solely on quantity and ignore project conventions.
Clear Communication: Don't hesitate to ask questions and seek guidance from mentors and the community. Organizations may be overwhelmed with applications, so stepping up and answering questions on the community forums can help you stand out as well.
Ask for Feedback: Throughout the GSoC program, ask for and incorporate feedback from project mentors. During the GSoC application phase, contributors who clearly demonstrate an ability to receive and act on feedback will stand out. It can be frustrating for project reviewers to repeat the same feedback across multiple PRs, especially concerning project style and conventions. Make it a goal to reduce the number of reviewer comments on each PR. Clean PRs requiring little-to-no review feedback significantly improve the odds of you setting yourself apart from the pack.
Respect Maintainer Time: Be respectful of maintainer time. GSoC can be highly competitive, and, for many, GSoC acceptance is a meaningful resumé item. Recognize, however, that maintainers often have obligations and jobs outside of their open source work. Sometimes it just isn't possible to immediately review your PR or answer your question, especially toward the end of the GSoC application period. You can significantly improve the likelihood of a response if you heed the advice above; namely, invest in project familiarity early on, prioritize code quality and documentation, and incorporate feedback. Maintainers are human, and they are more likely to invest in you, the more you show you care about them.
Time Management: Plan your time effectively to meet project milestones and deadlines. The time will fly by, and you don't want to be scrambling to complete your project at the last minute. Break down your project into smaller tasks, and set realistic goals for each week. Where possible, be strategic in your planning, such that, if one task becomes blocked, you can continue making progress by working on other tasks in parallel. If you encounter obstacles, reach out for help sooner rather than later. Being proactive not only ensures you stay on track but also demonstrates your commitment and initiative.
Participate Beyond Code: Engage in discussions beyond code contributions. Once you have familiarized yourself with the project, gotten up to speed on how to contribute, and successfully made contributions to the codebase, help other newcomers by participating in community channels, answering questions, and directing them to appropriate resources. Not only does this show that you are invested in the community, but it also helps reduce maintainer burden—something which is unlikely to go unnoticed.
Be Adaptive and Open to Change: Sometimes your initial project plan may not work out as expected. Be flexible and willing to adjust your project scope or approach based on feedback and evolving project priorities.

💡 Remember that valuable contributions aren't limited to code alone. Participating in community discussions, improving documentation, and offering support to other newcomers are all meaningful ways to contribute and demonstrate commitment to the project.

Acknowledgments

Our heartfelt thanks go out to everyone involved in this year's GSoC, from the mentors and contributors to the broader community, and last but not least, to Google. We're excited to build on the momentum from this summer and look forward to seeing what the future holds for stdlib!

If you're interested in becoming a part of our growing community or exploring the opportunities GSoC can provide, visit our Google Summer of Code repository and join the conversation on our community channels. We're always excited to welcome new contributors!

And if you're just generally interested in contributing or staying updated, be sure to check out the project repository. Don't be shy, and come say hi. We'd love for you to be a part of our community!

Welcoming colors to the stdlib REPL!

Snehil Shah — Mon, 19 Aug 2024 18:50:31 +0000

The stdlib REPL now supports syntax highlighting and custom theming.

The stdlib REPL (Read-Eval-Print Loop) is an interactive interpreter environment for executing JavaScript and enabling easy prototyping, testing, debugging, and programming. With syntax highlighting now added, editing in the REPL becomes way more intuitive and fun.

How to get your hands on the new hotness? Download the latest package from npm, fire it up, and just start typing.

$ npm install -g @stdlib/repl
$ stdlib-repl

We have various themes to get started with. But if you want to make the REPL your own, you can also customize it. We explore customization later in this post.

stdlib

A brief segue about stdlib. stdlib is a standard library for numerical and scientific computation for use in web browsers and in server-side runtimes supporting JavaScript. The library provides high-performance and rigorously tested APIs for data manipulation and transformation, mathematics, statistics, linear algebra, pseudorandom number generation, array programming, and a whole lot more.

We're on a mission to make JavaScript (and TypeScript!) a preferred language for numerical computation. If this sounds interesting to you, check out the project on GitHub, and be sure to give us a star 🌟!

Moving on... 🏃💨

Themes

Where were we? Ah, yes, themes! The REPL comes with the following themes built-in.

Pro tip: You can always use the themes() REPL command to list available themes.

stdlib-ansi-basic: The classic. The default.

stdlib-ansi-light: For the light mode users.

stdlib-ansi-dark: For the normal users.

stdlib-ansi-strong: Expressive and bold.

solarized: My personal favorite.

minimalist: Enough said.

monokai: The one and only.

In order to change to the theme of your choice, use the REPL settings() command.

In [1]: settings( 'theme', 'solarized' )

Customization

You can create your own syntax highlighting themes using a theme definition. A theme definition is an object mapping each token type to its corresponding color. The following code snippet shows the theme definition for the monokai theme.

const monokai = {
    // Keywords:
    'control': 'brightRed',
    'keyword': 'italic brightCyan',
    'specialIdentifier': 'brightMagenta',

    // Literals:
    'string': 'brightYellow',
    'number': 'brightBlue',
    'literal': 'brightBlue',
    'regexp': 'underline yellow',

    // Identifiers:
    'command': 'bold brightGreen',
    'function': 'brightGreen',
    'object': 'italic brightMagenta',
    'variable': null,
    'name': null,

    // Others:
    'comment': 'brightBlack',
    'punctuation': null,
    'operator': 'brightRed'
}

For the full list of supported tokens, see the REPL documentation.

Pro tip: Use the getTheme() REPL command to find out how a theme was built.
In [1]: getTheme( 'solarized' )

Currently, the REPL supports ANSI colors, such as black, red, green, yellow, blue, magenta, cyan, and white, and their brighter variants, such as brightBlack and brightRed.

For more expressive themes, you can use styles, such as bold, italic, underline, strikethrough, and reversed, and background colors, such as bgRed and bgBrightRed.

Lastly, you can go wild by mixing and matching any of the above colors, styles, and background colors. So something like the following works:

italic red bgBrightGreen

Some might say this looks ridiculous, but good to know the REPL supports the ridiculousness!

Adding your own theme

To add your theme, use the addTheme() REPL command, as shown in the following REPL snippet.

In [1]: const theme = {
    'string': 'italic red bgBrightGreen',
    'keyword': 'bold magenta',
    // Be the artist...
};

In [2]: addTheme( 'bestThemeEver', theme )

Change your mind and added something you don't like? No worries. Just use the deleteTheme() REPL command to send the theme into oblivion, as in the following REPL snippet.

In [5]: deleteTheme( 'worstThemeEver' )

Want to call your theme something different? We've got you covered. Just use the renameTheme() REPL command, as in the following REPL snippet.

In [6]: renameTheme( 'bestThemeEver', 'secondBestThemeEver' )

If you prefer spooky action at a distance, simply use the corresponding REPL prototype methods for the above operations. Refer to the REPL documentation for the full list of REPL commands and prototype methods related to syntax highlighting and everything else.

Let's wrap this up

Time to end this post with a quote:

"Coding without syntax highlighting is like trying to read a book with all the words in the wrong order—frustrating, confusing, and not nearly as fun!"

— ChatGPT 4o mini

Boy ain't that the truth!

The stdlib REPL is in constant development, so feel free to reach out with new ideas and identified issues. Your feedback is appreciated and hugely important!

We've got some more REPL news and notes in the pipeline, so stay tuned for the drip. Until next time, cheers and happy REPLing!

Snehil Shah is a computer science undergrad, an audio nerd, and a contributor to stdlib.

The Accessor Protocol

Athan — Tue, 06 Aug 2024 20:39:01 +0000

In this post, I'll introduce you to the accessor protocol for generalized array-like object iteration. First, I'll provide an overview of built-in array-like objects, along with example usage. I'll then show you how you can create your own custom array-like objects using the same element access syntax. Next, we will explore why you might want to go beyond vanilla array-like objects to create more "exotic" variants to accommodate sparsity, deferred computation, and performance considerations. Following this, I'll introduce the accessor protocol and how it compares to possible alternatives. Finally, I'll showcase various example applications.

Sound good?! Great! Let's go!

TL;DR

The accessor protocol (also known as the get/set protocol) defines a standardized way for non-indexed collections to access element values. In order to be accessor protocol-compliant, an array-like object must implement two methods having the following signatures:

function get<T>( index: number ): T {...}
function set<T, U>( value: T, index: number ): U {...}

The protocol allows implementation-dependent behavior when an index is out-of-bounds and, similar to built-in array bracket notation, only requires that implementations support nonnegative index values. In short, the protocol prescribes a minimal set of behavior in order to support the widest possible set of use cases, including, but not limited to, sparse arrays, arrays supporting "lazy" (or deferred) materialization, shared memory views, and arrays which clamp, wrap, or constrain index values.

The following code sample provides an example class returning an array-like object implementing the accessor protocol and supporting strided access over a linear data buffer.

/**
* Class defining a strided array.
*/
class StridedArray {
    // Define private instance fields:
    #length; // array length
    #data;   // underlying data buffer
    #stride; // step size (i.e., the index increment between successive values)
    #offset; // index of the first indexed value in the data buffer

    /**
    * Returns a new StridedArray instance.
    *
    * @param {integer} N - number of indexed elements
    * @param {ArrayLikeObject} data - underlying data buffer
    * @param {number} stride - step size
    * @param {number} offset - index of the first indexed value in the data buffer
    * @returns {StridedArray} strided array instance
    */
    constructor( N, data, stride, offset ) {
        this.#length = N;
        this.#data = data;
        this.#stride = stride;
        this.#offset = offset;
    }

    /**
    * Returns the array length.
    *
    * @returns {number} array length
    */
    get length() {
        return this.#length;
    }

    /**
    * Returns the element located at a specified index.
    *
    * @param {number} index - element index
    * @returns {(void|*)} element value
    */
    get( index ) {
        return this.#data[ this.#offset + index*this.#stride ];
    }

    /**
    * Sets the value for an element located at a specified index.
    *
    * @param {*} value - value to set
    * @param {number} index - element index
    */
    set( value, index ) {
        this.#data[ this.#offset + index*this.#stride ] = value;
    }
}

// Define a data buffer:
const buf = new Float64Array( [ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 ] );

// Create a strided view over the data buffer:
const x1 = new StridedArray( 4, buf, 2, 1 );

// Retrieve the second element:
const v1 = x1.get( 1 );
// returns 4.0

// Mutate the second element:
x1.set( v1*10.0, 1 );

// Retrieve the second element:
const v2 = x1.get( 1 );
// returns 40.0

// Create a new strided view over the same data buffer, but reverse the elements:
const x2 = new StridedArray( 4, buf, -2, buf.length-1 );

// Retrieve the second element:
const v3 = x2.get( 1 );
// returns 6.0

// Mutate the second element:
x2.set( v3*10.0, 1 );

// Retrieve the second element:
const v4 = x2.get( 1 );
// returns 60.0

// Retrieve the third element from the first array view:
const v5 = x1.get( 2 );
// returns 60.0

As shown in the code sample above, a strided array is a powerful abstraction over built-in arrays and typed arrays, as it allows for arbitrary views having custom access patterns over a single buffer. In fact, strided arrays are the conceptual basis for multi-dimensional arrays, such as NumPy's ndarray and stdlib's ndarray, which are the fundamental building blocks of modern numerical computing. Needless to say, the example above speaks to the utility of going beyond built-in bracket syntax and providing APIs for generalized array-like object iteration.

To learn more about the accessor protocol and its use cases, continue reading the rest of the post below! 🚀

stdlib

A brief overview about stdlib. stdlib is a standard library for numerical and scientific computation for use in web browsers and in server-side runtimes supporting JavaScript. The library provides high-performance and rigorously tested APIs for data manipulation and transformation, mathematics, statistics, linear algebra, pseudorandom number generation, array programming, and a whole lot more.

Introduction

In JavaScript, we use bracket notation to access individual array elements. For example, in the following code sample, we use bracket notation to retrieve the second element in an array.

const x = [ 1, 2, 3 ];

// Retrieve the second element:
const v = x[ 1 ];
// returns 2

This works for both generic array and typed array instances. In the next code sample, we repeat the previous operation on a typed array.

const x = new Float64Array( [ 1, 2, 3 ] );
// returns <Float64Array>

// Retrieve the second element:
const v = x[ 1 ];
// returns 2

Similarly, one can use bracket notation for built-in array like objects, such as strings. In the next code sample, we retrieve the second UTF-16 code unit in a string.

const s = 'beep boop';

// Retrieve the second UTF-16 code unit:
const v = s[ 1 ];
// returns 'e'

In order to determine how many elements are in an array-like object, we can use the length property, as shown in the following code sample.

const x = [ 1, 2, 3 ];

const len = x.length;
// returns 3

Arrays and typed arrays are referred to as indexed collections, where elements are ordered according to their index value. An array-like object is thus an ordered list of values that one refers to using a variable name and index.

While JavaScript arrays and typed arrays have many methods (e.g., forEach, map, filter, sort, and more), the only required property that any array-like object (built-in or custom) must have is a length property. The length property tells us the maximum number of elements for which we can apply an operation. Without it, we'd never know when to stop iterating in a for loop!

Custom array-Like objects

We can create our own custom array-like objects using vanilla object literals. For example, in the following code sample, we create an object having numbered keys and a length property and retrieve the value associated with the key 1 (i.e., the second element).

const x = {
    'length': 3,
    '0': 1,
    '1': 2,
    '2': 3
};

// Retrieve the second element:
const v = x[ 1 ];
// returns 2

Notice that we're able to use numeric "indices". This is because, per the ECMAScript Standard, any non-symbol value used as a key is first converted to a string before performing property-value look-up. In which case, so long as downstream consumers don't assume the existence of specialized methods, but stick to only indexed iteration, downstream consumers can adopt array-like object neutrality.

For example, suppose we want to compute the sum of all elements in an array-like object. We could define the following function which accepts, as its sole argument, any object having a length property and supporting value access via numeric indices.

function sum( x ) {
    let total = 0;
    for ( let i = 0; i < x.length; i++ ) {
        total += x[ i ];
    }
    return total;
}

We can then provide all manner of array-like objects and sum is none-the-wiser, being capable of handling them all. In the following code sample, we separately provide a generic array, a typed array, and an array-like object, and, for each input value, the sum function readily computes the sum of all elements.

const x1 = [ 1, 2, 3 ];
const s1 = sum( x1 );
// returns 6

const x2 = new Int32Array( [ 1, 2, 3 ] );
const s2 = sum( x2 );
// returns 6

const x3 = {
    'length': 3,
    '0': 1,
    '1': 2,
    '2': 3
};
const s3 = sum( x3 );
// returns 6

This is great! So long as downstream consumers make minimal assumptions regarding the existence of prototype methods, preferably avoiding the use of methods entirely, we can create functional APIs capable of operating on any indexed collection.

But wait, what about those scenarios in which we want to use alternative data structures, such that property-value pairs are not so neatly aligned, or we want to leverage deferred computation, or create views on existing array-like objects? How can we handle those use cases?

Motivating use cases

Sparse arrays

Up until this point, we've concerned ourselves with "dense" arrays (i.e., arrays in which all elements can be stored sequentially in a contiguous block of memory). In JavaScript, in addition to dense arrays, we have the concept of "sparse" arrays. The following code sample demonstrates sparse array creation by setting an element located at an index which vastly exceeds the length of the target array.

const x = [];

// Convert `x` into a sparse array:
x[ 10000 ] = 3.14;

// Retrieve the second element:
const v1 = x[ 1 ];
// returns undefined

// Retrieve the last element:
const v10000 = x[ 10000 ];
// returns 3.14

// Retrieve the number of elements:
const len = x.length;
// returns 10001

Suffice it to say that, by not using the Array.prototype.push method and filling in values until element 10000, JavaScript engines responsible for compiling and optimizing your code treat the array as if it were a normal object, which is a reasonable optimization in order to avoid unnecessary memory allocation. Creating a sparse array in this fashion is often referred to as converting an array into "dictionary-mode", where an array is stored in a manner similar to a regular object instance. The above code sample is effectively equivalent to the following code sample where we explicitly define x to be an array-like object containing a single defined value at index 10000.

const x = {
    'length': 10001,
    '10000': 3.14
};

// Retrieve the second element:
const v1 = x[ 1 ];
// returns undefined

// Retrieve the last element:
const v10000 = x[ 10000 ];
// returns 3.14

Creating sparse arrays in this manner is fine for many use cases, but less than optimal in others. For example, in numerical computing, we'd prefer that the "holes" (i.e., undefined values) in our sparse array would be 0, rather than undefined. This way, the sum function we defined above could work on both sparse and dense arrays alike (setting aside, for the moment, any performance considerations).

Deferred computation

Next up, consider the case in which we want to avoid materializing array values until they are actually needed. For example, in the following snippet, we'd like the ability to define an array-like object without any pre-defined values and which supports "lazy" materialization such that values are materialized upon element access.

const x = {
    'length': 3
};

// Materialize the first element:
const v0 = x[ 0 ];
// returns 1

// Materialize the second element:
const v1 = x[ 1 ];
// returns 2

// Materialize the third element:
const v2 = x[ 2 ];
// returns 3

To implement lazy materialization in JavaScript, we could utilize the Iterator protocol; however, iterators are not directly "indexable" in a manner similar to array-like objects, and they don't generally have a length property indicating how many elements they contain. To know when they finish, we need to explicitly check the done property of the iterated value. While we can use the built-in for...of statement to iterate over Iterables, this requires either updating our sum implementation to use for...of, and thus require that all provided array-like objects also be Iterables, or introducing branching logic based on the type of value provided. Neither option is ideal, with both entailing increased complexity, constraints, performance-costs, or, more likely, some combination of the above.

Shared memory views

For our next motivating example, consider the case of creating arbitrary views over the same underlying block of memory. While typed arrays support creating contiguous views (e.g., by providing a shared ArrayBuffer to typed array constructors), situations may arise where we want to define non-contiguous views. In order to avoid unnecessary memory allocation, we'd like the ability to define arbitrary iteration patterns which allow accessing particular elements within an underlying linear data buffer.

In the following snippet, we illustrate the use case of an array-like object containing complex numbers which are stored in memory as interleaved real and imaginary components. To allow accessing and manipulating just the real components within the array, we'd like the ability to create a "view" atop the underlying data buffer which accesses every other element (i.e., just the real components). We could similarly create a "view" for only accessing the imaginary components.

// Define a data buffer of interleaved real and imaginary components:
const buf = [ 1.0, -2.0, 3.0, -4.0, 5.0, -6.0, 7.0, -8.0 ];

// Create a complex number array:
const x = new ComplexArray( buf );

// Retrieve the second element:
const z1 = x[ 1 ];
// returns Complex<3.0, -4.0>

// Create a view which only accesses real components:
const re = x.reals();

// Retrieve the real component of the second complex number:
const r = re[ 1 ];
// returns 3.0

// Mutate the real component:
re[ 1 ] = 10.0;

// Retrieve the second element of the complex number array:
const z2 = x[ 1 ];
// returns Complex<10.0, -4.0>

To implement such views, we'd need three pieces of information: (1) an underlying data buffer, (2) an array stride which defines the number of locations in memory between successive array elements, and (3) an offset which defines the location in memory of the first indexed element. For contiguous arrays, the array stride is unity, and the offset is zero. In the example above, for a real component view, the array stride is two, and the offset is zero; for an imaginary component view, the array stride is also two, but the offset is unity. Ideally, we could define a means for providing generalized access such that array-like objects which provide abstracted element indexing can also be provided to array-agnostic APIs, such as sum above.

Backing data structures

As a final example, consider the case where we'd like to compactly store an ordered sequence of boolean values. While we could use generic arrays (e.g., [true,false,...,true]) or Uint8Array typed arrays for this, doing so would not be the most memory efficient approach. Instead, a more memory efficient data structure would be a bit array comprised of a sequence of integer words in which, for each word of n bits, a 1-bit indicates a value of true and a 0-bit indicates a value of false.

The following code snippet provides a general idea of mapping a sequence of boolean values to bits, along with desired operations for setting and retrieving boolean elements.

const seq = [ true, false, true, ..., false, true, false ];
// bit array:    1      0     1  ...      0     1      0    => 101...010

const x = new BooleanBitArray( seq );

// Retrieve the first element:
const v0 = x[ 0 ];
// returns true

// Retrieve the second element:
const v1 = x[ 1 ];
// returns false

// Retrieve the third element:
const v2 = x[ 2 ];
// returns true

// Set the second element:
x[ 1 ] = true;

// Retrieve the second element:
const v3 = x[ 1 ];
// returns true

In JavaScript, we could attempt to subclass array or typed array built-ins in order to allow setting and getting elements via bracket notation; however, this approach would prove limiting as subclassing alone does not allow intercepting property access (e.g., x[i]), which would be needed in order to map an index to a specific bit. We could try and combine subclassing with Proxy objects, but this would come with a steep performance cost due to property accessor indirection--something which we'll revisit later in this post.

Accessor Protocol

To accommodate the above use cases and more, we'd like to introduce a conceptually simple, but very powerful, new protocol: the accessor protocol for generalized element access and iteration of array-like objects. The protocol doesn't require new syntax or built-ins. The protocol only defines a standard way to get and set element values.

Any array-like object can implement the accessor protocol (also known as the get/set protocol) by following two conventions.

Define a get method. A get method accepts a single argument: an integer value specifying the index of the element to return. Similar to bracket notation for built-in array and typed array objects, the protocol requires that the get method be defined for integer values that are nonnegative and within array bounds. Protocol-compliant implementations may choose to support negative index values, but that behavior should not be considered portable. Similarly, how implementations choose to handle out-of-bounds indices is implementation-dependent; implementations may return undefined, raise an exception, wrap, clamp, or some other behavior. By not placing restrictions on out-of-bounds behavior, the protocol can more readily accommodate a broader set of use cases.
Define a set method. A set method accepts two arguments: the value to set and an integer value specifying the index of the element to replace. Similar to the get method, the protocol requires that the set method be defined for integer indices that are nonnegative and within array bounds. And similarly, protocol-compliant implementations may choose to support negative index values, but that behavior should not be considered portable, and how implementations choose to handle out-of-bounds indices is implementation-dependent.

The following code sample demonstrates an accessor protocol-compliant array-like object.

// Define a data buffer:
const data = [ 1, 2, 3, 4, 5 ];

// Define a minimal array-like object supporting the accessor protocol:
const x = {
    'length': 5,
    'get': ( index ) => data[ index ],
    'set': ( value, index ) => data[ index ] = value
};

// Retrieve the third element:
const v1 = x.get( 2 );
// returns 3

// Set the third element:
x.set( 10, 2 );

// Retrieve the third element:
const v2 = x.get( 2 );
// returns 10

Three things to note about the above code sample.

The above example demonstrates another potential use case—namely, an array-like object which doesn't own the underlying data buffer and, instead, acts as a proxy for element access requests.
The signature for the set method may seem counter-intuitive, as one might expect the arguments to be reversed. The rationale for value being the first argument and index being the second argument is to be consistent with built-in typed array set method conventions, where the first argument is an array from which to copy values and the second argument is optional and specifies the offset at which to begin writing values from the first argument. While one could argue that set(v,i) is not ideal, given the argument order precedent found in built-ins, the protocol follows that precedent in order to avoid confusion.
In contrast to the built-in typed array set method which expects an array (or typed array) for the first argument, the accessor protocol only requires that protocol-compliant implementations support a single element value. Protocol-compliant implementations may choose to support first arguments which are array-like objects and do so in a manner emulating arrays and typed arrays; however, such behavior should not be considered portable.

In short, in order to be accessor protocol-compliant, an array-like object only needs to support single element retrieval and mutation via dedicated get and set methods, respectively.

While built-in typed arrays provide a set method, they are not accessor protocol-compliant, as they lack a dedicated get method, and built-in arrays are also not accessor protocol-compliant, as they lack both a get and a set method. Their lack of compliance is expected and, from the perspective of the protocol, by design in order to distinguish indexed collections from accessor protocol-compliant array-like objects.

Array-like objects implementing the accessor protocol should be expected to pay a small, but likely non-negligible, performance penalty relative to indexed collections using bracket notation for element access. As such, we expect that performance-conscious array-agnostic APIs will maintain two separate code paths: one for indexed collections and one for collections implementing the accessor protocol. Hence, the presence or absence of get and set methods provides a useful heuristic for determining which path takes priority. In general, for indexed collections which are also accessor protocol-compliant, the get and set methods should always take precedent over bracket notation.

The following code sample refactors the sum API defined above to accommodate array-like objects supporting the accessor protocol.

function isAccessorArray( x ) {
    return ( typeof x.get === 'function' && typeof x.set === 'function' );
}

function sum( x ) {
    let total = 0;

     // Handle accessor-protocol compliant collections...
    if ( isAccessorArray( x ) ) {
       for ( let i = 0; i < x.length; i++ ) {
            total += x.get( i );
        }
        return total;
    }
    // Handle indexed collections...
    for ( let i = 0; i < x.length; i++ ) {
        total += x[ i ];
    }
    return total;
}

For array-agnostic APIs which prefer brevity over performance optimization, one can refactor the previous code sample to use a small, reusable helper function which abstracts array element access and allows loop consolidation. A demonstration of this refactoring is shown in the following code sample.

function isAccessorArray( x ) {
    return ( typeof x.get === 'function' && typeof x.set === 'function' );
}

function array2accessor( x ) {
    if ( isAccessorArray( x ) ) {
        return x;
    }
    return {
        'length': x.length,
        'get': ( i ) => x[ i ],
        'set': ( v, i ) => x[ i ] = v
    };
}

function sum( x ) {
    let total = 0;

    x = array2accessor( x );
    for ( let i = 0; i < x.length; i++ ) {
        total += x.get( i );
    }
    return total;
}

As before, we can then provide all manner of array-like objects, including those supporting the accessor protocol, and sum is none-the-wiser, being capable of handling them all. In the following code sample, we separately provide a generic array, a typed array, an array-like object, and a "lazy" array implementing the accessor protocol, and, for each input value, the sum function readily computes the sum of all elements.

const x1 = [ 1, 2, 3 ];
const s1 = sum( x1 );
// returns 6

const x2 = new Int32Array( [ 1, 2, 3 ] );
const s2 = sum( x2 );
// returns 6

const x3 = {
    'length': 3,
    '0': 1,
    '1': 2,
    '2': 3
};
const s3 = sum( x3 );
// returns 6

const x4 = {
    'length': 3,
    'get': ( i ) => i + 1,
    'set': ( v, i ) => x4[ i ] = v
};
const s4 = sum( x4 );
// returns 6

Alternatives

At this point, you may be thinking that the accessor protocol seems useful, but why invent something new? Doesn't JavaScript already have mechanisms for inheriting indexed collection semantics (subclassing built-ins), supporting lazy materialization (iterators), proxying element access requests (Proxy objects), and accessing elements via a method (Array.prototype.at)?

Yes, JavaScript does have built-in mechanisms for supporting, at least partially, the use cases outlined above; however, each approach has limitations, which I'll discuss below.

Subclassing built-ins

In the early days of the web and prior to built-in subclassing support, third-party libraries would commonly add methods directly to the prototypes of built-in global objects in order to expose functionality missing from the JavaScript standard library—a practice which was, and still remains, frowned upon. After standardization of ECMAScript 2015, JavaScript gained support for subclassing built-ins, including arrays and typed arrays. By subclassing built-ins, we can create specialized indexed collections which not only extend built-in behavior, but also retain the semantics of bracket notation for indexed collections. Subclassing can be particularly beneficial when wanting to augment inherited classes with new properties and methods.

The following code sample demonstrates extending the built-in Array class to support in-place element-wise addition.

/**
* Class which subclasses the built-in Array class.
*/
class SpecialArray extends Array {
    /**
    * Performs in-place element-wise addition.
    *
    * @param {ArrayLikeObject} other - input array
    * @throws {RangeError} must have the same number of elements
    * @returns {SpecialArray} the mutated array
    */
    add( other ) {
        if ( other.length !== this.length ) {
            throw new RangeError( 'Must provide an array having the same length.' );
        }
        for ( let i = 0; i < this.length; i++ ) {
            this[ i ] += other[ i ];
        }
        return this;
    }
}

// Create a new SpecialArray instance:
const x = new SpecialArray( 10 );

// Call an inherited method to fill the array:
x.fill( 5 );

// Retrieve the second element:
const v1 = x[ 1 ];
// returns 5

// Create an array to add:
const y = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ];

// Perform element-wise addition:
x.add( y );

// Retrieve the second element:
const v2 = x[ 1 ];
// returns 7

While powerful, subclassing built-ins has several limitations.

With respect to the use cases discussed above, subclassing built-ins only satisfies the desire for preserving built-in bracket notation semantics. Subclassing does not confer support for lazy materialization, separate backing data structures, or shared memory views.
Subclassing built-ins imposes a greater implementation burden on subclasses. Particularly for more "exotic" array types, such as read-only arrays, subclasses may be forced to override and re-implement parent methods in order to ensure consistent behavior (e.g., returning a collection having a desired instance type).
Subclassing built-ins imposes an ongoing maintenance burden. As the ECMAScript Standard evolves and built-in objects gain additional properties and methods, those properties and methods may need to be overridden and re-implemented in order to preserve desired semantics.
Subclassing built-ins influences downstream user expectations. If a subclass inherits from a Float64Array, users will likely expect that any subclass satisfying an instanceof check supports all inherited methods, some of which may not be possible to support (e.g., for a read-only array, methods supporting mutation). Distinct (i.e., non-coupled) classes which explicitly own the interface contract will likely be better positioned to manager user expectations.
While subclassing built-ins can encourage reuse, object-oriented programming design patterns can more generally lead to code bloat (read: increased bundle sizes in web applications), as the more methods are added or overridden, the less likely any one of those methods is actually used in a given application.

For the reasons listed above, inheriting from built-ins is generally discouraged in favor of composition due to non-negligible performance and security impacts. One of the principle aims of the accessor protocol is to provide the smallest API surface area necessary in order to facilitate generalized array-like object iteration. Subclassing built-ins is unable to fulfill that mandate.

Iterators

Objects implementing the iterator protocol can readily support deferred computation (i.e., "lazy" materialization), but, for several of the use cases outlined above, the iterator protocol has limited applicability. More broadly, relying on the iterator protocol has three limitations.

First, as alluded to earlier in this post, the iterator protocol does not require that objects have a length property, and, in fact, iterators are allowed to be infinite. As a consequence, for operations requiring fixed memory allocation (e.g., as might be the case when needing to materialize values before passing a typed array from JavaScript to C within a Node.js native add-on), the only way to know how much memory to allocate is by first materializing all iterator values. Doing so may require first filling a temporary array before values can be copied to a final destination. This process is likely to be inefficient.

Furthermore, operations involving multiple iterators can quickly become complex. For example, suppose I want to perform element-wise addition for two iterators X and Y (i.e., x0+y0, x1+y1, ..., xn+yn). This works fine if X and Y have the same "length", but what if they have different lengths? Should iteration stop once one of the iterator ends? Or should a fill value, such as zero, be used? Or maybe this is unexpected behavior, and we should raise an exception? Accordingly, generalized downstream APIs accepting iterators may require tailored options to support various edge cases which simply aren't as applicable when working with array-like objects.

We could, of course, define a protocol requiring that iterators have a length property, but that leads us to the next limitation: iterators do not support random access. In order to access the n-th iterated value, one must materialize the previous n-1 values. This is also likely to be inefficient.

Lastly, in general, code paths operating on iterators are significantly slower than equivalent code paths operating on indexed collections. While the accessor protocol does introduce overhead relative to using bracket notation due to explicitly needing to call a method, the overhead is less than the overhead introduced by iterators.

The following code sample defines three functions: one for computing the sum of an indexed collection, one for computing the sum of an array-like object implementing the accessor protocol, and a third for computing the sum of an iterator using JavaScript's built-in for...of syntax.

function indexedSum( x ) {
    let total = 0;
    for ( let i = 0; i < x.length; i++ ) {
        total += x[ i ];
    }
    return total;
}

function accessorSum( x ) {
    let total = 0;
    for ( let i = 0; i < x.length; i++ ) {
        total += x.get( i );
    }
    return total;
}

function iteratorSum( x ) {
    let total = 0;
    for ( const v of x ) {
        total += v;
    }
    return total;
}

To assess the performance of each function, I ran benchmarks on an Apple M1 Pro running MacOS and Node.js v20.9.0. For a set of array lengths ranging from ten elements to one million elements, I repeated benchmarks three times and used the maximum observed rate for subsequent analysis and chart display. The results are provided in the following grouped column chart.

In the chart above, columns along the x-axis are grouped according to input array/iterator length. Accordingly, the first group of columns corresponds to input arrays/iterators having 10 elements, the second group to input array/iterators having 100 elements, and so on and so forth. The y-axis corresponds to normalized rates relative to the performance observed for indexed collections. For example, if the maximum observed rate when summing over an indexed collection was 100 iterations per second and the maximum observed rate when summing over an iterator was 70 iterations per second, the normalized rate is 70/100, or 0.7. Hence, a rate equal to unity indicates an observed rate equal to that of indexed collections. Anything less than unity indicates an observed rate less than that of indexed collections (i.e., summation involving a given input was slower than using built-in bracket notation). Anything greater than unity indicates an observed rate greater than that of indexed collections (i.e., summation involving a given input was faster than using built-in bracket notation).

From the chart, we can observe that, for all array lengths, neither accessor protocol-compliant array-like objects nor iterators matched or exceeded the performance of indexed collections. Array-like objects implementing the accessor protocol were 15% slower than indexed collections, and iterators were 30% slower than indexed collections. These results confirm that the accessor protocol introduces an overhead relative to indexed collections, but not nearly as much as the overhead introduced by iterators.

In short, the accessor protocol is both more flexible and more performant than using iterators.

Computed properties

Another alternative to the accessor protocol is to use defined properties having property accessors. When implementing lazy materialization and proxied element access prior to ECMAScript standardization of Proxy objects and support for Array subclassing, property descriptors were the primary way to emulate the built-in bracket notation of indexed collections.

The following code sample shows an example class returning an array-like object emulating built-in bracket notation by explicitly defining property descriptors for all elements. Each property descriptor defines an accessor property with specialized get and set accessors.

/**
* Class emulating built-in bracket notation for lazy materialization without subclassing.
*/
class LazyArray {
    // Define private instance fields:
    #data; // memoized value cache

    /**
    * Returns a new fixed-length "lazy" array.
    *
    * @param {number} len - number of elements
    * @returns {LazyArray} lazy array instance
    */
    constructor( len ) {
        Object.defineProperty( this, 'length', {
            'configurable': false,
            'enumerable': false,
            'writable': false,
            'value': len
        });
        for ( let i = 0; i < len; i++ ) {
            Object.defineProperty( this, i, {
                'configurable': false,
                'enumerable': true,
                'get': this.#get( i ),
                'set': this.#set( i )
            });
        }
        this.#data = {};
    }

    /**
    * Returns a getter.
    *
    * @private
    * @param {number} index - index
    * @returns {Function} getter
    */
    #get( index ) {
        return get;

        /**
        * Returns an element.
        *
        * @private
        * @returns {*} element
        */
        function get() {
            const v = this.#data[ index ];
            if ( v === void 0 ) {
                // Perform "lazy" materialization:
                this.#data[ index ] = index; // note: toy example
                return index;
            }
            return v;
        }
    }

    /**
    * Returns a setter.
    *
    * @private
    * @param {number} index - index
    * @returns {Function} setter
    */
    #set( index ) {
        return set;

        /**
        * Sets an element value.
        *
        * @private
        * @param {*} value - value to set
        * @returns {boolean} boolean indicating whether a value was set
        */
        function set( value ) {
            this.#data[ index ] = value;
            return true;
        }
    }
}

// Create a new "lazy" array:
const x = new LazyArray( 10 );

// Print the list of elements:
for ( let i = 0; i < x.length; i++ ) {
    console.log( x[ i ] );
}

There are several issues with this approach:

Explicitly defining property descriptors is very expensive. Thus, especially for large arrays, instantiation can become prohibitively slow.
Creating separate accessors for each property requires significantly more memory than the accessor protocol. The latter only needs two methods to serve all elements. The former requires two methods for every element.
Element access is orders of magnitude slower than both built-in bracket notation and the accessor protocol.

In the following grouped column chart, I show benchmark results for computing the sum over an array-like object which emulates built-in bracket notation by using property accessors. The chart extends the previous grouped column chart by including the same column groups as the previous chart and adding a new column to each group corresponding to property accessor performance results. As can be observed, using property accessors is more than one hundred times slower than either indexed collection built-in bracket notation or the accessor protocol.

Proxies

The Proxy object allows you to create a proxy for another object. During its creation, the proxy can be configured to intercept and redefine fundamental object operations, including getting and setting properties. While proxy objects are commonly used for logging property accesses, validation, formatting, or sanitizing inputs, they enable novel and extremely powerful extensions to built-in behavior. One such extension—implementing Python-like indexing in JavaScript—will be the subject of a future post.

The following code sample defines a function for creating proxied array-like objects which intercept the operations for getting and setting property values. The proxy is created by providing two parameters:

target: the original object we want to proxy.
handler: an object defining which operations to intercept and how to redefine those operations.

/**
* Tests whether a string contains only integer values.
*
* @param {string} str - input string
* @returns {boolean} boolean indicating whether a string contains only integer values
*/
function isDigitString( str ) {
    return /^\d+$/.test( str );
}

/**
* Returns a proxied array-like object.
*
* @param {number} len - array length
* @returns {Proxy} proxy object
*/
function lazyArray( len ) {
    const target = {
        'length': len
    };
    return new Proxy( target, {
        'get': ( target, property ) => {
            if ( isDigitString( property ) ) {
                return parseInt( property, 10 ); // note: toy example
            }
            return target[ property ];
        },
        'set': ( target, property, value ) => {
            target[ property ] = value;
            return true;
        }
    });
}

// Create a new "lazy" array:
const x = lazyArray( 10 );

// Print the list of elements:
for ( let i = 0; i < x.length; i++ ) {
    console.log( x[ i ] );
}

While proxy objects avoid many of the issues described above for subclassing, iterators, and property accessors, including random access, instantiation costs, and general complexity, their primary limitation at the time of this blog post is performance.

The following group column chart builds on the previous column charts by adding a new column to each group corresponding to proxy object results. As can be observed, using proxy objects fares no better than the property accessor approach described above. Performance is on par with property accessors and more than one hundred times slower than either indexed collection built-in bracket notation or the accessor protocol.

Using "at" rather than "get"

The 2022 revision of the ECMAScript Standard added an at method to array and typed array prototypes which accepts a single integer argument and returns the element at that index, allowing for both positive and negative integers. Why, then, do we need another method for retrieving an array element as proposed in the accessor protocol? This question seems especially salient given that the protocol's get method only requires support for nonnegative integer arguments, making the get method seem less powerful.

There are a few reasons why the accessor protocol chooses to use get, rather than at.

The name get has symmetry with the name set.
The at method does not have a built-in method equivalent for setting element values. The set method is only present on typed arrays, not generic arrays, and does not support negative target offsets.
The at method does not match built-in bracket notation semantics. When using a negative integer within square brackets, the integer value is serialized to a string before property lookup (i.e., x[-1] is equivalent to x['-1']). Unless negative integer properties are explicitly defined, x[-1] will return undefined. In contrast, x.at(-1) is equivalent to x[x.length-1], which, for non-empty arrays, will return the last array element. Accordingly, the get method of the accessor protocol allows protocol-compliant implementations to match built-in bracket notation semantics exactly.
The accessor protocol does not specify the behavior of out-of-bounds index arguments. In contrast, when an index argument is negative, the at method normalizes a negative index value according to index + x.length. This, however, is not the only reasonable behavior, depending on the use case. For example, an array-like object implementation may want to clamp out-of-bounds index arguments, such that indices less than zero are clamped to zero (i.e., the first index) and indices greater than x.length-1 are clamped to x.length-1 (i.e., the last index). Alternatively, an array-like object implementation may want to wrap out-of-bounds index arguments using modulo arithmetic. Lastly, an array-like object implementation may want to raise an exception when an index is out-of-bounds. In short, the at method prescribes a particular mode of behavior, which may not be appropriate for all use cases.
By only requiring support for nonnegative integer arguments, the accessor protocol allows protocol-compliant implementations to minimize branching and ensure better performance. While convenient, support for negative indices is not necessary for generalized array-like object iteration.
As the EMCAScript Standard does not define a get method for arrays and typed arrays (at least not yet!), the presence or absence of a get method in combination with a set method and length property allows for distinguishing indexed collections from array-like objects implementing the accessor protocol. The combination of at, set, and length would not be sufficient for making such a distinction. This ability is important in order to allow downstream array-like object consumers to implement optimized code paths and ensure optimal performance.

For these reasons, an at method is not a suitable candidate for use in generalized array-like object iteration.

Examples

Now that we've considered the alternatives and established the motivation and need for the accessor protocol, what can we do with it?! Glad you asked! To answer this question, I provide several concrete implementations below.

Complex number arrays

Complex numbers have applications in many scientific domains, including signal processing, fluid dynamics, and quantum mechanics. We can extend the concept of typed arrays to the realm of complex numbers by storing real and imaginary components as interleaved values within a real-valued typed array. In the following code sample, I define a minimal immutable complex number constructor and a complex number array class implementing the accessor protocol.

/**
* Class defining a minimal immutable complex number.
*/
class Complex {
    // Define private instance fields:
    #re; // real component
    #im; // imaginary component

    /**
    * Returns a new complex number instance.
    *
    * @param {number} re - real component
    * @param {number} im - imaginary component
    * @returns {Complex} complex number instance
    */
    constructor( re, im ) {
        this.#re = re;
        this.#im = im;
    }

    /**
    * Returns the real component of a complex number.
    *
    * @returns {number} real component
    */
    get re() {
        return this.#re;
    }

    /**
    * Returns the imaginary component of a complex number.
    *
    * @returns {number} imaginary component
    */
    get im() {
        return this.#im;
    }
}

/**
* Class defining a complex number array implementing the accessor protocol.
*/
class Complex128Array {
    // Define private instance fields:
    #length; // array length
    #data;   // underlying data buffer

    /**
    * Returns a new complex number array instance.
    *
    * @param {number} len - array length
    * @returns {Complex128Array} complex array instance
    */
    constructor( len ) {
        this.#length = len;
        this.#data = new Float64Array( len*2 ); // accommodate interleaved components
    }

    /**
    * Returns the array length.
    *
    * @returns {number} array length
    */
    get length() {
        return this.#length;
    }

    /**
    * Returns an array element.
    *
    * @param {integer} index - element index
    * @returns {(Complex|void)} element value
    */
    get( index ) {
        if ( index < 0 || index >= this.#length ) {
            return;
        }
        const ptr = index * 2; // account for interleaved components
        return new Complex( this.#data[ ptr ], this.#data[ ptr+1 ] );
    }

    /**
    * Sets an array element.
    *
    * @param {Complex} value - value to set
    * @param {integer} index - element index
    * @returns {void}
    */
    set( value, index ) {
        if ( index < 0 || index >= this.#length ) {
            return;
        }
        const ptr = index * 2; // account for interleaved components
        this.#data[ ptr ] = value.re;
        this.#data[ ptr+1 ] = value.im;
    }
}

// Create a new complex number array:
const x = new Complex128Array( 10 );
// returns <Complex128Array>

// Retrieve the second element:
const z1 = x.get( 1 );
// returns <Complex>

const re1 = z1.re;
// returns 0.0

const im1 = z1.im;
// returns 0.0

// Set the second element:
x.set( new Complex( 3.0, 4.0 ), 1 );

// Retrieve the second element:
const z2 = x.get( 1 );
// returns <Complex>

const re2 = z2.re;
// returns 3.0

const im2 = z2.im;
// returns 4.0

If you are interested in a concrete implementation of complex number arrays, see the Complex128Array and Complex64Array packages provided by stdlib. We'll have more to say about these packages in future blog posts.

Sparse arrays

Applications of sparse arrays commonly arise in network theory, numerical analysis, natural language processing, and other areas of science and engineering. When data is "sparse" (i.e., most elements are non-zero), sparse array storage can be particularly advantageous in reducing required memory storage and in accelerating the computation of operations involving only non-zero elements.

In the following code sample, I define a minimal accessor protocol-compliant sparse array class using the dictionary of keys (DOK) format and supporting arbitrary fill values. Support for arbitrary fill values is useful as it extends the concept of sparsity to any array having a majority of elements equal to the same value. For such arrays, we can compress an array to a format which stores a single fill value and only those elements which are not equal to the repeated value. This approach is implemented below.

/**
* Class defining a sparse array implementing the accessor protocol.
*/
class SparseArray {
    // Define private instance fields:
    #length; // array length
    #data;   // dictionary containing array elements
    #fill;   // fill value

    /**
    * Returns a new sparse array instance.
    *
    * @param {number} len - array length
    * @param {*} fill - fill value
    * @returns {SparseArray} sparse array instance
    */
    constructor( len, fill ) {
        this.#length = len;
        this.#data = {};
        this.#fill = fill;
    }

    /**
    * Returns the array length.
    *
    * @returns {number} array length
    */
    get length() {
        return this.#length;
    }

    /**
    * Returns an array element.
    *
    * @param {number} index - element index
    * @returns {*} element value
    */
    get( index ) {
        if ( index < 0 || index >= this.#length ) {
            return;
        }
        const v = this.#data[ index ];
        if ( v === void 0 ) {
            return this.#fill;
        }
        return v;
    }

    /**
    * Sets an array element.
    *
    * @param {*} value - value to set
    * @param {number} index - element index
    * @returns {void}
    */
    set( value, index ) {
        if ( index < 0 || index >= this.#length ) {
            return;
        }
        this.#data[ index ] = value;
    }
}

// Create a new sparse array:
const x = new SparseArray( 10, 0.0 );

// Retrieve the second element:
const v1 = x.get( 1 );
// returns 0.0

// Set the second element:
x.set( 4.0, 1 );

// Retrieve the second element:
const v2 = x.get( 1 );
// returns 4.0

Lazy arrays

While less broadly applicable, situations may arise in which you want an array-like object supporting lazy materialization and random access. For example, suppose each element is the result of an expensive computation, and you want to defer the computation of each element until first accessed.

In the following code sample, I define a class supporting lazy materialization of randomly generated element values. When an element is accessed, a class instance eagerly computes all un-computed element values up to and including the accessed element. Once an element value is computed, the value is memoized and can only be overridden by explicitly setting the element.

/**
* Class defining an array-like object supporting lazy materialization of random values.
*/
class LazyRandomArray {
    // Define private instance fields:
    #data;   // underlying data buffer

    /**
    * Returns a new lazy random array.
    *
    * @returns {LazyRandomArray} new instance
    */
    constructor() {
        this.#data = [];
    }

    /**
    * Materializes array elements.
    *
    * @private
    * @param {number} len - array length
    */
    #materialize( len ) {
        for ( let i = this.#data.length; i < len; i++ ) {
            this.#data.push( Math.random() );
        }
    }

    /**
    * Returns the array length.
    *
    * @returns {number} array length
    */
    get length() {
        return this.#data.length;
    }

    /**
    * Returns an array element.
    *
    * @param {number} index - element index
    * @returns {*} element value
    */
    get( index ) {
        if ( index < 0 ) {
            return;
        }
        if ( index >= this.#data.length ) {
            this.#materialize( index+1 );
        }
        return this.#data[ index ];
    }

    /**
    * Sets an array element.
    *
    * @param {*} value - value to set
    * @param {number} index - element index
    * @returns {void}
    */
    set( value, index ) {
        if ( index < 0 ) {
            return;
        }
        if ( index >= this.#data.length ) {
            // Materialize `index+1` in order to ensure "fast" elements:
            this.#materialize( index+1 );
        }
        this.#data[ index ] = value;
    }
}

// Create a new lazy array:
const x = new LazyRandomArray();

// Retrieve the tenth element:
const v1 = x.get( 9 );
// returns <number>

// Set the tenth element:
x.set( 4.0, 9 );

// Retrieve the tenth element:
const v2 = x.get( 9 );
// returns 4.0

// Return the number of elements in the array:
const len = x.length;
// returns 10

stdlib

While array-like objects implementing the accessor protocol are useful in their own right, they become all the more powerful when combined with functional APIs which are accessor protocol-aware. Fortunately, stdlib treats accessor protocol-compliant objects as first-class citizens, providing support for them throughout its codebase.

For example, the following code sample uses @stdlib/array-put to replace the elements of an accessor protocol-compliant strided array at specified indices.

const put = require( '@stdlib/array-put' );

/**
* Class defining a strided array.
*/
class StridedArray {
    // Define private instance fields:
    #length; // array length
    #data;   // underlying data buffer
    #stride; // step size (i.e., the index increment between successive values)
    #offset; // index of the first indexed value in the data buffer

    /**
    * Returns a new StridedArray instance.
    *
    * @param {integer} N - number of indexed elements
    * @param {ArrayLikeObject} data - underlying data buffer
    * @param {number} stride - step size
    * @param {number} offset - index of the first indexed value in the data buffer
    * @returns {StridedArray} strided array instance
    */
    constructor( N, data, stride, offset ) {
        this.#length = N;
        this.#data = data;
        this.#stride = stride;
        this.#offset = offset;
    }

    /**
    * Returns the array length.
    *
    * @returns {number} array length
    */
    get length() {
        return this.#length;
    }

    /**
    * Returns the element located at a specified index.
    *
    * @param {number} index - element index
    * @returns {(void|*)} element value
    */
    get( index ) {
        return this.#data[ this.#offset + index*this.#stride ];
    }

    /**
    * Sets the value for an element located at a specified index.
    *
    * @param {*} value - value to set
    * @param {number} index - element index
    */
    set( value, index ) {
        this.#data[ this.#offset + index*this.#stride ] = value;
    }
}

// Define a data buffer:
const buf = new Float64Array( [ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 ] );

// Create a strided view over the data buffer:
const x1 = new StridedArray( 4, buf, 2, 1 );

// Retrieve the second element:
const v1 = x1.get( 1 );
// returns 4.0

// Retrieve the fourth element:
const v2 = x1.get( 3 );
// returns 8.0

// Replace the second and fourth elements with new values:
put( x, [ 1, 3 ], [ -v1, -v2 ] );

// Retrieve the second element:
const v3 = x1.get( 1 );
// returns -4.0

// Retrieve the fourth element:
const v4 = x1.get( 3 );
// returns -8.0

In addition to supporting accessor protocol-compliant array-like objects in utilities, linear algebra operations, and other vectorized APIs, stdlib has leveraged the accessor protocol to implement typed arrays supporting data types beyond real-valued numbers. To see this in action, see stdlib's Complex128Array, Complex64Array, and BooleanArray typed array constructors.

In short, the accessor protocol is a powerful abstraction which is not only performant, but can accommodate new use cases with minimal effort.

Conclusion

In this post, we dove deep into techniques for array-like object iteration. Along the way, we discussed the limitations of current approaches and identified opportunities for a lightweight means for element retrieval that can flexibly accommodate a variety of use cases, including strided arrays, arrays supporting deferred computation, shared memory views, and sparse arrays. We then learned about the accessor protocol which provides a straightforward solution for accessing elements in a manner consistent with built-in bracket notation and having minimal performance overhead. With the power and promise of the accessor protocol firmly established, we wrapped up by showcasing a few demos of the accessor protocol in action.

In short, we covered a lot of ground, but I hope you learned a thing or two along the way. In future posts, we'll explore more applications of the accessor protocol, including in the implementation of complex number and boolean typed arrays. We hope that you'll continue to follow along as we share our insights and that you'll join us in our mission to realize a future where JavaScript and the web are preferred environments for numerical and scientific computation. 🚀

If you'd like to view the code covered in this post on GitHub, please visit the source code repository.

How to call Fortran routines from JavaScript with Node.js

Athan — Wed, 24 Jul 2024 02:25:09 +0000

Fortran is a commonly used language for numerical and scientific computation, underpinning many of the higher-level numerical libraries and programming languages in use today. Since Fortran's original development in 1957, researchers and software developers have used Fortran as a primary language for high-performance computation and authored thousands of high-performance programs and libraries for astronomy, climate modeling, computational chemistry, fluid dynamics, simulation, weather prediction, and more.

Rather than attempt to re-implement the entirety of the Fortran ecosystem, programming languages, such as R, MATLAB, and Julia, and numerical libraries, such as NumPy and SciPy, have opted to provide language-specific wrappers around Fortran functionality. Despite significant interest in numerical computing on the web, no one has developed comprehensive JavaScript bindings for Fortran libraries. That is, until now.

In this post, we'll begin laying the groundwork for authoring high-performance Fortran bindings and explore how to call Fortran routines from JavaScript using Node.js. We'll start with a brief introduction to Fortran, followed by writing and compiling a simple Fortran program. We'll then discuss how to use Node-API to link a compiled Fortran routine to the Node.js runtime. And we'll conclude by demonstrating how to use stdlib to simplify the authoring of Node.js bindings.

By the end of this post, you'll have a good understanding of how to call Fortran routines from JavaScript using Node.js.

Prerequisites

Throughout this post, we'll be writing sample programs and performing various steps to compile and run Fortran programs. We'll assume that you have some familiarity with using the terminal, executing commands, and running JavaScript programs. For the most part, terminal commands will assume a Linux-based operating system. Some modifications may be required to successfully run commands and perform compilation steps on Windows.

If you're hoping to follow along, you'll need the following prerequisites:

You'll want to make sure you've installed the latest stable Node.js version. To check whether Node.js is already installed
```
$ node --version
```
where $ is the terminal prompt and node --version is the entered command.
We'll be using npm for installing Node.js dependencies, but you should be able to adapt any installation commands to your preferred JavaScript package manager (e.g., Yarn, pnpm, etc).
In order to generate build files appropriate for your operating system (OS), we'll be using node-gyp, which, in turn, has varying prerequisites depending on your OS, including the availability of Python. For more details, see the node-gyp installation instructions.
In order to compile Fortran programs, you'll need a Fortran compiler. In this post, we'll be using GNU Fortran (GFortran) to compile Fortran code. GFortran is an implementation of the Fortran programming language in the widely used GNU Compiler Collection (GCC), an open-source project maintained under the umbrella of the GNU Project. To check whether GFortran is already installed
```
$ gfortran --version
```
And finally, we'll be using GCC to compile and link C source code. To check whether GCC is already installed
```
$ gcc --version
```

If you don't have one or more of the above installed, you'll want to go ahead and install those now.

Introduction to Fortran

Fortran is a compiled, imperative programming language well-suited to numerical and scientific computation. Known for its high performance, versatility, and ease of use, Fortran is natively parallel and has built-in support for array handling. This makes Fortran a popular choice for scientific computing.

Many fundamental libraries for numerical computation, such as BLAS (basic linear algebra subprograms), LAPACK (linear algebra package), SLATEC, and MINPACK, among many others, are written in Fortran. These libraries serve as the foundation of popular open-source numerical computation libraries, such as NumPy and SciPy, and numerical programming languages, such as R, MATLAB, and Julia.

Given Fortran's widespread usage and decades of development, one could argue that most modern numerical programming languages and libraries are simply fancy wrappers around Fortran routines. Therefore, enabling JavaScript to call Fortran routines not only leverages these high-performance libraries but also positions JavaScript as a viable language for machine learning and other computation-intensive tasks.

Now, let's get started by compiling our first Fortran program!

Compiling our first Fortran program

Recognizing that some readers of this post may not be familiar with Fortran, let's kick things off by writing a "Hello world" program in Fortran for adding two numbers and printing the result. To begin, open up a text editor and create the file add.f90 containing the following code which contains a function definition for adding two integers and a main program which calls that function and prints the result.

! file: add.f90

!>
! Adds two integer values.
!
! @param {integer} x - first input value
! @param {integer} y - second input value
!<
integer function add( x, y )
    ! Define the input parameters:
    integer, intent(in) :: x, y
    ! ..
    ! Compute the sum:
    add = x + y
end function add

!>
! Main execution sequence.
!<
program main
    ! Local variables:
    character(len=999) :: str, tmp
    ! ..
    ! Intrinsic functions:
    intrinsic adjustl, trim
    ! ..
    ! Define a variable for storing the sum:
    integer :: res
    ! ..
    ! Compute the sum:
    res = add( 12, 15 )
    ! ..
    ! Print the results:
    write (str, '(I15)') res
    tmp = adjustl( str )
    print '(A, A)', 'The sum of 12 and 15 is ', trim( tmp )
end program

There are a few things to note in the above program. The first is that, in general, Fortran routines pass arguments by reference. A common practice is to define and pass output variables for storing results—something that we'll revisit later in this post.

Second, a best practice is to specify the intent(xx) of a variable. In the code above, intent(in) indicates that an argument must not be redefined or become undefined during the execution of a subroutine. Similarly, intent(out) indicates that an argument must be defined before the argument is referenced within a subroutine.

Third, in order to print formatted results, we need to perform various string manipulation steps, including writing to character buffers (write), adjusting alignment (adjustl), and trimming results (trim).

For the purposes of getting something working, our program defines a single variable res, which receives the result of passing two number literals to an add function. To test whether the code works, we first need to see if it compiles, and, to do this, we'll use the GNU Fortran (GFortran) compiler, which is part of the GNU Compiler Collection (GCC). While other Fortran compilers exist, such as the Intel Fortran Compiler, LLVM Flang, and LFortran, GFortran is one of the most widely used Fortran compilers, and what we cover in this post should readily translate elsewhere.

In a terminal, navigate to the directory containing add.f90, and execute the following command

$ gfortran add.f90 -o add.out && ./add.out

where add.f90 is the file path of the file to be compiled and add.out is the file path to use for storing a generated executable. If all went according to plan, you should see the following text as output

The sum of 12 and 15 is 27

Defining another Fortran subroutine

In add.f90, we defined a self-contained Fortran program which adds two numbers and prints the result. But what if we want to call Fortran functions and subroutines from another Fortran file or from outside of Fortran, such as from JavaScript running in Node.js?

To see how this is done, let's begin by creating another Fortran file mul.f90, this time containing a subroutine for multiplying two integers and returning an integer result.

! file: mul.f90

!>
! Multiplies two integer values.
!
! @param {integer} x - first input value
! @param {integer} y - second input value
! @param {integer} res - output argument for storing the result
!<
subroutine mul( x, y, res )
    integer, intent(in) :: x, y
    integer, intent(out) :: res
    res = x * y
end subroutine mul

Similar to add, mul takes two input parameters x and y, but this time mul is a subroutine which takes an output parameter res for the storing the result.

If we try compiling mul.f90 as we did with add.f90,

$ gfortran mul.f90 -o mul.out

we'll encounter an error message similar to the following

Undefined symbols for architecture arm64:
  "_main", referenced from:
      <initial-undefines>
ld: symbol(s) not found for architecture arm64
collect2: error: ld returned 1 exit status

In order to successfully generate a standalone executable, Fortran code must have a main program providing an entry point for execution. Without this entry point, a Fortran compiler does not where to begin executing code or where to look to identify the procedures and functions necessary to run a program.

For mul.f90, we're not wanting Fortran to drive execution, and, instead, we're interested in defining an entry point outside of Fortran which will enable a JavaScript runtime to drive execution. This means that we need to figure out a way to establish a bridge between a JavaScript runtime exposing native APIs and Fortran code containing APIs which we want to use. In order to establish such a bridge, we need to disentangle two compiler phases: compilation and linking.

Linking

At a high level, compilation is the process of translating one programming language into another programming language. Often this means taking expressions written in a higher-level language, such as Fortran, and translating them to a lower-level language, such as machine code, in order to create an executable program that a machine can natively understand. The output of compilation is one or more object files, which typically have .o or .obj filename extensions.

Linking is the process of taking one or more object files and combining them into a single executable file. During linking, a "linker" performs several tasks:

symbol resolution: resolving references to functions and variables across different object files.
address binding: assigning final memory addresses to a program's functions and variables.
library inclusion: including code from static or dynamic libraries as required.
executable creation: producing the final executable file that can be run on a target system.

When we ran the GFortran command above

$ gfortran mul.f90 -o mul.out

the compiler attempted to perform both compilation and linking. However, if we're trying to combine compiled Fortran code with a separate library (or a runtime such as Node.js), we need to split compilation and linking into separate steps.

Accordingly, in order to just generate the object file, we can amend the previous command as follows

$ gfortran -c mul.f90

where the -c flag instructs the compiler to compile, but not to link. After running this command from the same directory as mul.f90, you should see a mul.o (or mul.obj) file containing the compiled source code.

Linking Fortran files

To demonstrate linking as a separate phase, create a mul_script.f90 file containing the following code containing a main program which calls the mul function and prints the result.

! file: mul_script.f90

!>
! Main execution sequence.
!<
program main
    ! Local variables:
    character(len=999) :: str, tmp
    ! ..
    ! Intrinsic functions:
    intrinsic adjustl, trim
    ! ..
    ! Define a variable for storing the product:
    integer :: res
    ! ..
    ! Call the `mul` subroutine to compute the product:
    call mul( 4, 5, res )
    ! ..
    ! Print the results:
    write (str, '(I15)') res
    tmp = adjustl( str )
    print '(A, A)', 'The product of 4 and 5 is ', trim( tmp )
end program

We can then perform the same compilation step as we did for mul.f90.

$ gfortran -c mul_script.f90

At this point, we should have two object files: mul.o and mul_script.o (or mul.obj and mul_script.obj, respectively). To link them into a single executable, we can run the following command in which we define the path of the output executable and pass in the paths of the object files we wish to link.

$ gfortran -o mul_script.out mul.o mul_script.o

Once linked, we can test that everything works by running the generated executable.

$ ./mul_script.out

If all went according to plan, you should see the following text as output

The product of 4 and 5 is 20

At this point, we've successfully compiled and linked together separate Fortran source files, and we can now turn our attention to linking compiled Fortran to non-Fortran code.

Linking Fortran and C

A common scenario in numerical computing is exposing numerical computing libraries written in Fortran as C functions. C also happens to be the programming language used by Node.js to expose APIs for building native add-ons (i.e., extensions to the Node.js runtime). Accordingly, if we can figure out how to link Fortran to C, we'll be well on our way to creating a Node.js native add-on capable of calling Fortran routines.

Writing Fortran wrappers

While the mul function defined above can be used in conjunction with other Fortran files, we cannot simply call mul from C as we do in Fortran because Fortran expects arguments to be passed by reference rather than by value. It's also worth mentioning that, because Fortran functions can only return scalar values and not, e.g., pointers to arrays, general best practice is to expose Fortran functions as subroutines, which are the equivalent of C functions returning void and which allow passing pointers for storing output return values.

While mul is already a subroutine, if we wanted to expose add to C, we'd first need to wrap add as a subroutine in a manner similar to the following code snippet containing the subroutine wrapper addsub which forwards input arguments to add and assigns the result to an output argument res.

!>
! Wraps `add` as a subroutine.
!
! @param {integer} x - first input value
! @param {integer} y - second input value
! @param {integer} res - output argument for storing the result
!<
subroutine addsub( x, y, res )
    implicit none
    ! ..
    ! External functions:
    interface
        integer function add( x, y )
            integer :: x, y
        end function add
    end interface
    ! ..
    integer, intent(in) :: x, y
    integer, intent(out) :: res
    ! ..
    res = add( x, y )
    return
end subroutine addsub

Defining function prototypes in C

With those preliminaries out of the way, to help the C compiler reason about functions defined elsewhere (e.g., in a Fortran library or in other source files), we need to define function prototypes for any functions we plan to use before we use them. For our use case of calling a single Fortran routine, we can create a mul_fortran.h header file containing a single function declaration for the mul subroutine.

// file: mul_fortran.h

#ifndef MUL_FORTRAN_H
#define MUL_FORTRAN_H

#ifdef __cplusplus
extern "C" {
#endif

void mul( const int *x, const int *y, int *res );

#ifdef __cplusplus
}
#endif

#endif

One thing to note is that, in the above header file, we prevent name mangling by using extern "C". This is common practice in order to facilitate interoperation of C and C++, and preventing name mangling helps avoid compiler errors if we decide to use mul in C++ in the future.

Calling Fortran routines from C

Next, similar to how we created a Fortran program for calling a Fortran function defined in a separate file, we can create a main.c file containing a main function which calls mul and prints the result.

// file: main.c

#include "mul_fortran.h"
#include <stdio.h>

int main( void ) {
    int x = 4;
    int y = 5;
    int res;

    // Compute the product, passing arguments by reference:
    mul( &x, &y, &res );

    printf( "The product of %d and %d is %d\n", x, y, res );
    return 0;
}

Compiling C and Fortran

To compile our C program, we can run the following command

$ gcc -I mul_fortran.h -c main.c

where -I mul_fortran.h instructs the compiler to use the function declarations defined in the header file we created above.

Before linking main.o and mul.o, we first need to recompile mul.f90, making sure to instruct GFortran to not modify function names by appending underscores during compilation. This ensures that the name used in our C code matches the exported symbol from compiled Fortran. One should be careful, however, as non-mangled names may conflict with existing symbols defined in C.

To prevent GFortran from appending underscores to symbol names, we set the -fno-underscoring compiler option when calling GFortran.

$ gfortran -fno-underscoring -c mul.f90

Now that we've compiled our source files, it's time to generate an executable!

$ gcc -o main.out main.o mul.o

Depending on your operating system, if the previous command errors, you may need to modify the previous command to

$ gcc -o main.out main.o mul.o -lgfortran

where -lgfortran instructs GCC to link to the standard Fortran libraries. And finally, to test that everything works, we run the executable by entering the following command

$ ./main.out

If successful, you should see the following text as output

The product of 4 and 5 is 20

Phew! If you're new to Fortran and C, congratulations on making it this far!

Now that we've successfully managed to link Fortran and C code, we can turn our attention to using Node.js native add-ons to call Fortran routines from JavaScript.

Node-API

Node-API is an API for building Node.js native add-ons (i.e., extensions to the Node.js JavaScript runtime). There's a long history of add-on evolution and development in Node.js, of which I'll spare you the details. The real benefit of Node-API is in providing a stable Application Binary Interface (ABI), which insulates add-ons from changes in the underlying JavaScript engine (namely, V8) and which allows modules compiled for one version of Node.js to run on later versions of Node.js without recompilation. In short, Node-API provides the glue code, in the form of C APIs, necessary for us to extend Node.js capabilities with C/C++ code written and compiled independently of Node.js itself.

In order to access Node-API APIs, we need to do two things:

Include the <node_api.h> header in our C files.
Compile C source files using Node-API APIs with node-gyp, a build system based on Google's GYP, a meta-build system for generating other build systems.

So without further ado...

Creating an add-on file

Let's start by creating an addon.c file which will serve as an entry point for our native add-on. In this file, we'll define two functions—addon and Init—and register a Node-API module which exports a function in a manner similar to how we'd export a function if writing a module in vanilla JavaScript.

// file: addon.c

#include <node_api.h>
#include <assert.h>

/**
* Receives JavaScript callback invocation data.
*
* @param env    environment under which the function is invoked
* @param info   callback data
* @return       Node-API value
*/
static napi_value addon( napi_env env, napi_callback_info info ) {

    // NOTE: we'll add code here later in this post

    return NULL;
}

/**
* Defines the Node.js module "exports" object for the native add-on.
*
* @param env      environment under which the function is invoked
* @param exports  exports object
* @return         Node-API value
*/
static napi_value Init( napi_env env, napi_value exports ) {
    napi_value fcn;

    // Export the add-on function as a "default" export:
    napi_status status = napi_create_function( env, "exports", NAPI_AUTO_LENGTH, addon, NULL, &fcn );

    // Verify that we successfully wrapped the `addon` function as a JavaScript function object:
    assert( status == napi_ok );

    // Return the JavaScript function object to allow registering with the JavaScript runtime:
    return fcn;
}

/**
* Register a Node-API module which exports a function.
*/
NAPI_MODULE( NODE_GYP_MODULE_NAME, Init )

The addon.c file is comprised of three parts:

addon: this function receives JavaScript invocation data. If we assume foo() is a JavaScript function exposed by a native add-on, env is the environment in which the JavaScript code runs and info is an opaque object which can be used to retrieve function arguments and other contextual data when foo is invoked.
Init: similar to how module.exports defines the APIs a Node.js module exposes to other Node.js modules, this function defines the "exports" object and initializes exported values. In this context, initialization typically means wrapping C APIs as JavaScript objects so that a JavaScript engine can pass data back and forth between JavaScript and native code.
NAPI_MODULE: this is a macro exposed by Node-API for registering a Node-API module with the Node.js JavaScript runtime.

At this point, we're starting to accumulate a number of moving parts: Fortran source files, GFortran, C source files, GCC, Node-API, and a heretofore mentioned, but not explained, node-gyp.

As may be observed in the diagram above, a key component which we have yet to cover, but which is necessary to allow building a Node.js native add-on in a manner that is portable across platforms, is the binding.gyp file. It's this file and node-gyp that we'll dive into next.

node-gyp

node-gyp is a build system based on Google's GYP, which, in turn, is a meta-build system for generating other build systems. The key idea behind GYP is the generation of build files, such as Makefiles, Ninja build files, Visual Studio projects, and XCode projects, which are tailored to the platform on which a project is being compiled. Once GYP scaffolds a project in a manner tailored to the host platform, GYP can then perform build steps which replicate as closely as possible the way that one would have set up a native build of the project were one writing the project build system from scratch. node-gyp subsequently extends GYP by providing the configuration and tooling specific to developing Node.js native add-ons.

Configuring how to build an add-on

In order to describe the configuration necessary to build a Node.js native add-on, one needs to provide a binding.gyp file. This file is written in a JSON-like format and is placed at the root of a JavaScript package alongside a package's package.json file. GYP configuration files can be awkward to write, and, unfortunately, GYP has long been abandoned by the Google team responsible for its creation. Adding insult to injury, good documentation for authoring GYP files can be hard to come by, as the GYP documentation is incomplete and finding real-world examples doing exactly what you are wanting to do can be a time-consuming task, especially when authoring binding.gyp files requiring specialized configuration (e.g., as might be needed when compiling CUDA, OpenCL, or Fortran).

Nevertheless, persist we shall! Fortunately, writing a minimal binding.gyp file capable of supporting Fortran compilation is within reach. Start by creating a binding.gyp file specifying various configuration parameters, including build targets, source files, compiler flags, and rules for how to process files having a specific file type.

# file: binding.gyp

# A `.gyp` file for building a Node.js native add-on.
#
# [1]: https://gyp.gsrc.io/docs/InputFormatReference.md
# [2]: https://gyp.gsrc.io/docs/UserDocumentation.md
{
  # Define variables to be used throughout the configuration for all targets:
  'variables': {
    # Set variables based on the host OS:
    'conditions': [
      [
        'OS=="win"',
        {
          # Define the object file suffix on Windows:
          'obj': 'obj',
        },
        {
          # Define the object file suffix for other operating systems (e.g., Linux and MacOS):
          'obj': 'o',
        }
      ],
    ],
  },

  # Define compilation targets:
  'targets': [
    # Define a target to generate an add-on:
    {
      # The target name should match the add-on export name (see addon.c above):
      'target_name': 'addon',

      # List of source files:
      'sources': [
        # Relative paths should be relative to this configuration file...
        './addon.c',
        './mul.f90',
      ],

      # List directories which contain relevant headers to include during compilation:
      'include_dirs': [
        # Relative paths should be relative to this configuration file...
        './',
      ],

      # Define settings which should be applied when a target's object files are used as linker input:
      'link_settings': {
        # Define linker flags for libraries against which to link (e.g., '-lm', '-lblas', etc):
        'libraries': [],

        # Define directories in which to find libraries to link to (e.g., '/usr/lib'):
        'library_dirs': []
      },

      # Define custom build actions for particular source files:
      'rules': [
        {
          # Define a rule name:
          'rule_name': 'compile_fortran',

          # Define the filename extension for which this rule should apply:
          'extension': 'f90',

          # Set a flag specifying whether to process generated output as sources for subsequent steps:
          'process_outputs_as_sources': 1,

          # Define the pathnames to be used as inputs when performing processing:
          'inputs': [
            # Full path of the current input:
            '<(RULE_INPUT_PATH)',
          ],

          # Define the outputs produced during processing:
          'outputs': [
            # Store an output object file in a directory for placing intermediate results (only accessible within a single target):
            '<(INTERMEDIATE_DIR)/<(RULE_INPUT_ROOT).<(obj)',
          ],

          # Define the command-line invocation:
          'action': [
            'gfortran',
            '-fno-underscoring',
            '-c',
            '<@(_inputs)',
            '-o',
            '<@(_outputs)',
          ],
        },
      ],
    },
  ],
}

A few comments:

GYP configuration files support variables, conditionals, and expressions. In the configuration file above, <(RULE_INPUT_PATH), <(INTERMEDIATE_DIR), and <(RULE_INPUT_ROOT) are predefined variables provided by the GYP generator module. Variables such as <@(_inputs) and <@(_outputs) represent variable expansions and correspond to variables which should be expanded in list contexts.
While GYP attempts to automate and abstract away the generation of build files tailored to the operating system on which to compile, this doesn't absolve us from needing to consider platform variability. For example, the configuration file above includes a conditional for resolving an appropriate object file filename extension based on the target operating system.
Configuration files can quickly become complex depending on operating system variability, including the availability of specialized compilers, such as GFortran, and the need for bespoke rules for varying input file types.

Building an add-on

Now that we have a GYP configuration file, it's time to install node-gyp. In your terminal, run

$ npm install --no-save node-gyp

The node-gyp executable will subsequently be available in the ./node_modules/.bin directory. To generate the appropriate project build files for the current platform, run the following command

$ ./node_modules/.bin/node-gyp configure

This will generate a ./build directory containing platform-specific build files. To build the native add-on, we can run

$ ./node_modules/.bin/node-gyp build

which will generate an addon.node file in a ./build/Release sub-folder. To remove generated files, run

$ ./node_modules/.bin/node-gyp clean

As we continue to iterate on our addon.c file, we'll want to perform the clean-configure-build sequence each time we make changes. Accordingly, we can consolidate the above steps into a single command

$ ./node_modules/.bin/node-gyp clean && \
  ./node_modules/.bin/node-gyp configure && \
  ./node_modules/.bin/node-gyp build

Calling a Fortran routine from JavaScript

At this point, we've got almost all of the core building blocks for calling a Fortran routine from JavaScript. We're only missing two things:

Logic in addon.c which calls the Fortran routine.
A JavaScript file which invokes the function exposed by our native add-on.

Updating the add-on file

To start, let's revisit our addon.c file. In this file, we need to make four changes:

Retrieve provided arguments.
Convert from JavaScript objects to native C types.
Add logic to call our Fortran routine mul.
Return a result as a JavaScript object.

Luckily, we already have experience with (3) when we wrote main.c and linked against our compiled Fortran routine. As in main.c, we want to include the mul_fortran.h header, which we can do by making the following change in addon.c

// file: addon.c

+ #include "mul_fortran.h"
#include <node_api.h>
#include <assert.h>

Next, we'll want to modify the addon function in addon.c to include logic for calling the mul Fortran routine. In the snippet below, we copy the invocation logic used in main.c into the implementation of the addon function.

/**
* Receives JavaScript callback invocation data.
*
* @param env    environment under which the function is invoked
* @param info   callback data
* @return       Node-API value
*/
static napi_value addon( napi_env env, napi_callback_info info ) {

    // ...

    // Call the Fortran routine:
    int res;
    mul( &x, &y, &res );

    // ...

    return NULL;
}

Now on to argument munging. Fortunately, Node-API provides several APIs for converting from JavaScript objects to native C data types. In particular, we're interested in converting JavaScript numbers to C integers, which is demonstrated in the following code snippet which defines the number of expected input arguments, retrieves those arguments from provided callback info using napi_get_cb_info, and converts JavaScript objects to native C data types using napi_get_value_int32.

/**
* Receives JavaScript callback invocation data.
*
* @param env    environment under which the function is invoked
* @param info   callback data
* @return       Node-API value
*/
static napi_value addon( napi_env env, napi_callback_info info ) {
    napi_status status;

    // Define the expected number of input arguments:
    size_t argc = 2;

    // Retrieve the input arguments from the callback info:
    napi_value argv[ 2 ];
    status = napi_get_cb_info( env, info, &argc, argv, NULL, NULL );
    assert( status == napi_ok );

    // Convert each argument to a signed 32-bit integer:
    int x;
    status = napi_get_value_int32( env, argv[ 0 ], &x );
    assert( status == napi_ok );

    int y;
    status = napi_get_value_int32( env, argv[ 1 ], &y );
    assert( status == napi_ok );

    // Call the Fortran routine:
    int res;
    mul( &x, &y, &res );

    // ...

    return NULL;
}

And finally, we need to convert the integer result to a JavaScript object for use within JavaScript, which is demonstrated in the following code snippet which adds logic for converting a C signed 32-bit integer to an opaque object representing a JavaScript number using napi_create_int32.

/**
* Receives JavaScript callback invocation data.
*
* @param env    environment under which the function is invoked
* @param info   callback data
* @return       Node-API value
*/
static napi_value addon( napi_env env, napi_callback_info info ) {
    napi_status status;

    // Define the expected number of input arguments:
    size_t argc = 2;

    // Retrieve the input arguments from the callback info:
    napi_value argv[ 2 ];
    status = napi_get_cb_info( env, info, &argc, argv, NULL, NULL );
    assert( status == napi_ok );

    // Convert each argument to a signed 32-bit integer:
    int x;
    status = napi_get_value_int32( env, argv[ 0 ], &x );
    assert( status == napi_ok );

    int y;
    status = napi_get_value_int32( env, argv[ 1 ], &y );
    assert( status == napi_ok );

    // Call the Fortran routine:
    int res;
    mul( &x, &y, &res );

    // Convert the result to a JavaScript object:
    napi_value out;
    status = napi_create_int32( env, res, &out );
    assert( status == napi_ok );

    return out;
}

Putting it all together, we have the following addon.c file which defines the entirety of our native add-on bindings.


// file: addon.c

#include "mul_fortran.h"
#include <node_api.h>
#include <assert.h>

/**
* Receives JavaScript callback invocation data.
*
* @param env    environment under which the function is invoked
* @param info   callback data
* @return       Node-API value
*/
static napi_value addon( napi_env env, napi_callback_info info ) {
    napi_status status;

    // Define the expected number of input arguments:
    size_t argc = 2;

    // Retrieve the input arguments from the callback info:
    napi_value argv[ 2 ];
    status = napi_get_cb_info( env, info, &argc, argv, NULL, NULL );
    assert( status == napi_ok );

    // Convert each argument to a signed 32-bit integer:
    int x;
    status = napi_get_value_int32( env, argv[ 0 ], &x );
    assert( status == napi_ok );

    int y;
    status = napi_get_value_int32( env, argv[ 1 ], &y );
    assert( status == napi_ok );

    // Call the Fortran routine:
    int res;
    mul( &x, &y, &res );

    // Convert the result to a JavaScript object:
    napi_value out;
    status = napi_create_int32( env, res, &out );
    assert( status == napi_ok );

    return out;
}

/**
* Defines the Node.js module "exports" object for the native add-on.
*
* @param env      environment under which the function is invoked
* @param exports  exports object
* @return         Node-API value
*/
static napi_value Init( napi_env env, napi_value exports ) {
    napi_value fcn;

    // Export the add-on function as a "default" export:
    napi_status status = napi_create_function( env, "exports", NAPI_AUTO_LENGTH, addon, NULL, &fcn );

    // Verify that we successfully wrapped the `addon` function as a JavaScript function object:
    assert( status == napi_ok );

    // Return the JavaScript function object to allow registering with the JavaScript runtime:
    return fcn;
}

/**
* Register a Node-API module which exports a function.
*/
NAPI_MODULE( NODE_GYP_MODULE_NAME, Init )

To confirm that our Node.js add-on still compiles, we can re-run our build sequence defined above.

$ ./node_modules/.bin/node-gyp clean && \
  ./node_modules/.bin/node-gyp configure && \
  ./node_modules/.bin/node-gyp build

Creating a JavaScript file importing the native add-on

We're here! The moment that we've been waiting for! Time to create a JavaScript file which loads our Node.js native add-on and calls its public API. 🥁

Thankfully, loading a native add-on is just like loading any other JavaScript module. To see this in action, let's create a mul.js file which imports the native add-on module, calls the function exposed by the add-on, and prints the result.

// file: mul.js

// Import the native add-on module:
const addon = require( './build/Release/addon.node' );

// Compute the product of two integers:
const res = addon( 5, 10 );
console.log( 'The product of %d and %d is %d', 5, 10, res );

To test whether everything works as expected, we can run the script by passing the script's file path to the Node.js executable.

$ node ./mul.js

If all went according to plan, you should see the following text as output

The product of 5 and 10 is 50

That's it! We did it. 😅

Barring any platform quirks or dreaded compiler errors, we successfully called a Fortran routine from JavaScript. 🙌

Simplifying add-on authoring with stdlib

Depending on API complexity, authoring Node.js native add-ons can be verbose and error prone. This verbosity largely stems from the need for argument validation logic and status checks. For example, when handling typed arrays, one needs to perform multiple steps, such as verifying that an input argument is a typed array, verifying that an input argument is a typed array of the correct type, resolving the length of a typed array, converting a JavaScript object representing a typed array to a C pointer pointing to the start of the underlying typed array memory, and, for applications involving strided arrays, ensuring that typed array properties are consistent with other input arguments, such as strides and offsets.

While some validation logic can be performed in JavaScript or omitted entirely, a general best practice is to include such logic in order to ensure data integrity when calling APIs outside of Node-API APIs and to avoid hard-to-track down bugs leading to segmentation faults and buffer overflows. And furthermore, best practice requires that, after each invocation of a Node-API function, one must check napi_status return values to ensure that the JavaScript engine was able to successfully perform the requested operation. As a consequence, lines of code add up, and you find yourself writing the same logic over and over.

Macros for module registration and data type conversion

To simplify add-on authoring, stdlib provides several utilities, both functional APIs and macros, which abstract away common boilerplate. For example, we can refactor the addon.c file defined above to use stdlib's napi macros for retrieving input arguments, handling conversion to and from native C data types, and initializing and registering an exported function with Node.js.

// file: addon2.c

#include "mul_fortran.h"
#include "stdlib/napi/create_int32.h"
#include "stdlib/napi/argv_int32.h"
#include "stdlib/napi/argv.h"
#include "stdlib/napi/export.h"
#include <node_api.h>

static napi_value addon( napi_env env, napi_callback_info info ) {
    STDLIB_NAPI_ARGV( env, info, argv, argc, 2 ); // retrieve function arguments
    STDLIB_NAPI_ARGV_INT32( env, x, argv, 0 );    // convert to C data type
    STDLIB_NAPI_ARGV_INT32( env, y, argv, 1 );    // convert to C data type
    int res;
    mul( &x, &y, &res );
    STDLIB_NAPI_CREATE_INT32( env, res, out );    // convert to JavaScript object
    return out;
}

STDLIB_NAPI_MODULE_EXPORT_FCN( addon )

Specialized macros for common function signatures

The use case explored in this post—namely, calling a C/Fortran function which operates on and returns scalar values—is something that we do quite often in stdlib, especially for testing native C APIs and sharing test logic across JavaScript and C implementations. Accordingly, stdlib provides several more macro abstractions which abstract away all argument retrieval, argument validation, and module registration logic for certain input/output data type combinations.

// file: addon3.c

#include "mul_fortran.h"
#include "stdlib/math/base/napi/binary.h"

static int multiply( x, y ) {
    int res;
    mul( &x, &y, &res );
    return res;
}

STDLIB_MATH_BASE_NAPI_MODULE_II_I( multiply )

Two comments regarding the code above:

STDLIB_MATH_BASE_NAPI_MODULE_II_I is a macro for registering a Node-API module for an exported function accepting two signed 32-bit integer input arguments and returning a signed 32-bit integer output value. This signature is encoded in the macro name as II_I.
We need to wrap the Fortran routine in a C function, as the module registration macro assumes that a registered II_I function expects arguments to be passed by value, not by reference, and returns a scalar value.

Learning from real-world examples in stdlib

For more details on how we author Node-API native add-ons and leverage macros and various utilities for simplifying the add-on creation process, the best place to start is by browsing stdlib source code. For the examples explored in this post, we've brushed aside some of the complexity in ensuring cross-platform configuration portability (looking at you Windows!) and in specifying compiler options for optimizing compiled code. For those interested in learning more, you'll find many more examples throughout the codebase, and, if you have questions, don't be afraid to stop by and say hi! 👋

Conclusion

In this post, we explored several aspects when authoring Node.js native add-ons, with a particular eye toward being able to call Fortran routines from JavaScript. This effort involved compilation and linking, writing C interfaces, module registration, and build configuration. Along the way, we relied on a variety of tools for generating build artifacts, including Fortran and C compilers, Node-API, and node-gyp. We touched on best practices and potential pitfalls, and we observed how stdlib can make authoring Node.js native add-ons much easier.

All in all, it was a lot, with several moving parts and complex toolchains. But our exploration was well worth the effort. By leveraging Fortran's high-performance capabilities within Node.js, you can significantly enhance and accelerate your numerical and scientific computing tasks. With Node.js native add-ons, you can bridge the gap between modern web technologies and established scientific computing practices, providing a powerful toolset for you and others and opening the door to new and more powerful Node.js applications.

In future posts, we'll explore more complex use cases, including the ability to leverage hardware-optimized routines for linear algebra and machine learning. There's still a lot to learn and more ground to cover. We hope that you'll continue to follow along as we share our insights and that you'll join us in our mission to realize a future where JavaScript and the web are preferred environments for numerical and scientific computation. 🚀

If you'd like to view the code covered in this post on GitHub, please visit the source code repository.