Uutils at GSOC

Google summer of code is:

Google Summer of Code is a global, online program focused on bringing new contributors into open source software development. GSoC Contributors work with an open source organization on a 12+ week programming project under the guidance of mentors.

If you want to know more about how it works, check out the links below.

Useful links:

What is it about?

The uutils project is aiming at rewriting key Linux utilities in Rust, targeting coreutils, findutils, diffutils, procps, util-linux, and bsdutils. Their goal is to create fully compatible, high-performance drop-in replacements, ensuring reliability through upstream test suites. Significant progress has been made with coreutils, diffutils, and findutils, while the other utilities are in the early stages of development.

How to get started

Here are some steps to follow if you want to apply for a GSOC project with uutils.

  1. Check the requirements. You have to meet Google's requirements to apply. Specifically for uutils, it's best if you at least know some Rust and have some familiarity with using the coreutils and the other tools.

  2. Reach out to us! We are happy to discuss potential projects and help you find a meaningful project for uutils. Tell us what interests you about the project and what experience you have and we can find a suitable project together. You can talk to the uutils maintainers on the Discord server. In particular, you can contact:

    • Sylvestre Ledru (@sylvestre on GitHub and Discord)
  3. Get comfortable with uutils. To find a good project you need to understand the codebase. We recommend that you take a look at the code, the issue tracker and maybe try to tackle some good-first-issues. Also take a look at our contributor guidelines.

  4. Find a project and a mentor. We have a list of potential projects you can adapt or use as inspiration. Make sure discuss your ideas with the maintainers! Some project ideas below have suggested mentors you could contact.

  5. Write the application. You can do this with your mentor. The application has to go through Google, so make sure to follow all the advice in Google's Contributor Guide. Please make sure you include your prior contributions to uutils in your application.

Tips

Project Ideas

This page contains project ideas for the Google Summer of Code for uutils. Feel free to suggest project ideas of your own.

Guidelines for the project list

Summarizing that page, each project should include:

Performance optimization for coreutils

While uutils/coreutils has achieved strong GNU compatibility, some utilities can still benefit from performance improvements to match or exceed GNU coreutils speed. This project focuses on identifying performance bottlenecks and implementing optimizations across key utilities.

The goal is to systematically profile, benchmark, and optimize coreutils to ensure they are production-ready for performance-critical environments.

Key areas of work include:

Expand differential fuzzing for coreutils

The uutils/coreutils project has some fuzzing infrastructure in place, but many utilities still lack comprehensive fuzz testing. This project focuses on expanding differential fuzzing coverage across coreutils to identify edge cases, improve robustness, and ensure compatibility with GNU coreutils.

Differential fuzzing compares the behavior of uutils implementations against GNU coreutils to automatically detect discrepancies and bugs that might be missed by traditional testing.

Key areas of work include:

Complete findutils GNU compatibility

The uutils/findutils project has made significant progress with more than half of the GNU findutils and BFS tests passing. This project focuses on completing the remaining work to achieve full GNU compatibility and production readiness.

The goal is to finish implementing missing features, fix failing test cases, and ensure the utilities (find, xargs, locate, etc.) are fully compatible with their GNU counterparts.

Key areas of work include:

Complete diffutils GNU compatibility

The uutils/diffutils project provides Rust implementations of diff, diff3, cmp, and sdiff. Significant progress has been made, but additional work is needed to achieve full GNU compatibility and handle all edge cases.

This project focuses on completing the remaining features, fixing compatibility issues, and ensuring all utilities pass the GNU test suite.

Key areas of work include:

Complete the Rust implementation of sed

The sed (stream editor) utility is a fundamental Unix tool for parsing and transforming text. A Rust implementation has been started but requires significant work to achieve full compatibility with GNU sed and POSIX standards.

This project focuses on completing the existing Rust sed implementation to make it production-ready. The work involves implementing missing commands and flags, fixing edge cases, improving regular expression support, and ensuring the implementation passes the GNU test suite.

Key areas of work include:

Rust implementation of grep

The goal of this project is to create a high-performance, feature-complete Rust implementation of grep (GNU grep) as part of the uutils ecosystem. While tools like ripgrep exist, this project aims to provide a drop-in replacement for GNU grep with full compatibility, including all command-line options, output formats, and edge case behaviors.

The grep utility is one of the most widely-used Unix tools for searching text using patterns. A uutils implementation would need to balance GNU compatibility with the performance advantages that Rust can provide.

Key aspects of the project include:

Rust implementation of awk

The goal of this project is to create a Rust-based implementation of awk, one of the most powerful and widely-used text processing utilities in Unix/Linux systems. The awk utility provides a complete programming language for pattern scanning and processing, making it essential for data extraction, report generation, and text transformation tasks.

This implementation would be a standalone project within the uutils ecosystem, similar to how findutils and diffutils are organized. The primary objectives are to achieve compatibility with POSIX awk specification and GNU awk (gawk) extensions, while leveraging Rust's performance and safety guarantees.

Key aspects of the project include:

Complete procps implementation and GNU compatibility

The uutils/procps project aims to reimplement process and system monitoring utilities in Rust. While initial implementations have been started for various tools, this project focuses on completing the core utilities and achieving production readiness with full GNU compatibility.

This project focuses on completing the most essential procps utilities (ps, top, pgrep, pkill, free, uptime) and ensuring they are ready for real-world usage.

Key areas of work include:

Complete util-linux implementation and GNU compatibility

The uutils/util-linux project aims to reimplement essential system utilities in Rust. This project focuses on completing the most commonly-used util-linux utilities and achieving production-ready status with full GNU compatibility.

This project prioritizes completing utilities that are frequently used in scripts and system administration (dmesg, lscpu, mount, umount, kill, logger).

Key areas of work include:

Complete bsdutils implementation

The uutils/bsdutils project focuses on reimplementing BSD-origin utilities commonly found on Linux systems. This project aims to complete the core utilities and achieve compatibility with both BSD and GNU/Linux variants.

This project focuses on completing essential bsdutils tools like logger, script, column, hexdump, and look, ensuring they work correctly across different Unix-like systems.

Key areas of work include:

Localization

Support for localization for formatting, quoting & sorting in various utilities, like date, ls and sort. For this project, we need to figure out how to deal with locale data. The first option is to use the all-Rust icu4x library, which has a different format than what distributions usually provide. In this case a solution could be to write a custom localedef-like command. The second option is to use a wrapper around the C icu library, which comes with the downside of being a C dependency.

This is described in detail in issue #3997.

And was also discussed in #1919, #3584.

procps: Development of Process Management and Information Tools in Rust

This project focuses on creating Rust-based implementations of process management and information tools: ps, pgrep, pidwait, pkill, skill, and snice. The goal is to ensure full compatibility with all options and successful passing of GNU tests, maintaining the functionality and reliability of these essential tools.

procps: Development of System Monitoring and Statistics Tools in Rust

This project involves the Rust-based development of system monitoring and statistics tools: top, vmstat, tload, w, and watch. The objective is to achieve full compatibility with all options and to pass GNU tests, ensuring these tools provide accurate and reliable system insights.

procps: Development of Memory and Resource Analysis Tools in Rust

The aim of this project is to develop Rust-based versions of memory and resource analysis tools: pmap and slabtop. The project will focus on ensuring full compatibility with all options and passing GNU tests, providing in-depth and reliable analysis of memory usage and kernel resources.

util-linux: Reimplementation of essential system utilities in Rust

The objective of this project is to reimplement essential system utilities from the util-linux package in Rust. This initiative will include the development of Rust-based versions of various utilities, such as dmesg, lscpu, lsipc, lslocks, lsmem, and lsns. The primary focus will be on ensuring that these Rust implementations provide full compatibility with existing options and pass GNU tests, delivering reliable and efficient system utilities for Linux users.

util-linux: Process and Resource Management: Reimplementation in Rust

This project focuses on the reimplementations of crucial Linux utilities related to process and resource management in the Rust programming language. The target utilities include runuser, sulogin, chrt, ionice, kill, renice, prlimit, taskset, and uclampset. The primary goal is to create Rust-based versions of these utilities, ensuring compatibility with their original counterparts, and validating their functionality with GNU tests.

util-linux: User and Session Management: Reimplementation in Rust

This project focuses on the reimplementations of essential Linux utilities related to user and session management in the Rust programming language. The target utilities include su, agetty, ctrlaltdel, pivot_root, switch_root, last, lslogins, mesg, setsid, and setterm. The primary goal is to create Rust-based versions of these utilities, ensuring compatibility with their original counterparts, and validating their functionality with GNU tests.

This project aims to modernize and enhance critical Linux utilities related to user and session management, ensuring they remain efficient, reliable, and fully compatible with existing systems.

Code refactoring for procps, util-linux, and bsdutils

Refactoring the Rust-based versions of procps, util-linux, and bsdutils to reduce code duplication.

A multicall binary and core library for findutils

findutils currently exists of a few unconnected binaries. It would be nice to have a multicall binary (like coreutils) and a library of shared functions (like uucore).

This also might require thinking about sharing code between coreutils and findutils.

Implementation of GNU Test Execution for procps, util-linux, diffutils, and bsdutils

The project aims at integrating the GNU test suite execution using the Rust-based versions of procps, util-linux, diffutils, and bsdutils, ensuring compatibility, crucial for seamless drop-in replacement integration. We have been doing such operation successfully for the Coreutils using GitHub Actions, a build script and a run script.

Symbolic/Fuzz Testing and Formal Verification of Tool Grammars

See Using Lightweight Formal Methods to Validate a Key Value Storage Node In Amazon S3.

Most KLEE scaffolding was done for KLEE 2021.

Start with wc, formalize the command line grammar. Get it working under AFL++ and Klee. Add several proofs of resource use and correctness - especially proofs about operating system calls and memory/cache usage. Generalize to other tools. Try to unify the seeds for the fuzzer and KLEE so they can help each other find new paths. Use QEMU to test several operating systems and architectures. Automate detection of performance regressions - try to hunt for accidentally quadratic behavior.

Specific to wc - formalize the inner loop over a UTF-8 buffer into a finite state automata with counters that can generalize into SIMD width operations like simdjson. Further generalize into a monoid so K processors can combine results.

Development of advanced terminal session recording and replay tools in Rust

This project involves creating Rust-based implementations of /usr/bin/script, /usr/bin/scriptlive, and /usr/bin/scriptreplay. The /usr/bin/script command will record terminal sessions, /usr/bin/scriptlive will offer real-time recording features, and /usr/bin/scriptreplay will be used to replay recorded sessions.

The work will happen in https://github.com/uutils/bsdutils.

Official Redox support

We want to support the Redox operating system, but are not actively testing against it. Since the last round of fixes in #2550, many changes have probably been introduced that break Redox support. This project would involve setting up Redox in the CI and fixing any issues that arise and porting features over.