Since 2012, we have seen the benefits of an open source by default policy in the following outcomes:
Open source software development has been critical at CFPB since we opened our doors. In April 2012, just nine months after the Bureau opened, we released our Source Code Policy, based on the work of the Department of Defense, along with our first two open source projects. Just six days later, we accepted our first pull request from a member of the public.
Our approach to open source in these early days was to publicly release small bits of source code that had potential for wider application. In November of 2012 and January of 2013, our first class of design and development fellows began working at the Bureau. Two of the projects that were developed by this initial class of fellows, the open data platform Qu and eRegulations, set a new precedent: they were developed from day one entirely in the open on GitHub. Shortly afterwards, we formed an open source working group, which worked to put procedures in place for open sourcing code that was only available internally. As more code was moved to the public, the team moved towards a policy of open source by default, where source code is developed in private only in certain well-defined cases, including security risks or when the license is restricted.
With an open source by default policy in place, we’ve been able to collaborate with other government agencies and the public at hackathon events where designers and developers come together to work on open source software:
These experiences led us to create internal guidelines for attending these events and sharing our work to get developer contributions at them. Though we’ve come a long way in the way we approach open source, we are continually seeking to improve by:
In March 2016, the White House released a draft federal open source software policy for public comment. Members of the public gave feedback and suggested changes via GitHub.com. To date, the public has contributed 20 merged pull requests and over 200 discussion threads.
If you have thoughts about how we can or other federal agencies might continue to evolve our approach to open source software, we’d encourage you to discuss the Federal Open Source Policy on GitHub.
]]>
The CFPB is looking for our next Chief Information Officer (CIO) to lead our technology team in supporting the mission of the CFPB and in setting an exceptional example for how technology can work in government.
The Consumer Financial Protection Bureau (CFPB) is a government agency that helps consumer finance markets work for Americans by making rules more effective, by consistently and fairly enforcing those rules, and by empowering consumers to take more control over their economic lives. From the Bureau’s founding in the wake of the 2008 financial crisis, it has always been clear that we need to leverage modern technology and take a data-driven approach to achieve these goals. The CFPB technology team uses and produces open source code, has embraced systems in the public cloud, and follows an Agile software development model. Our team is made up of talented individuals from both the public and private sectors, and through CFPB’s Technology and Innovation Fellows program, we bring together designers, developers, data scientists, and security professionals to collaborate and solve business problems using technology.
The CIO is the CFPB’s senior leader for all aspects of technology. The role is a hybrid of a traditional CIO, providing services that support the Bureau’s operations, and a CTO, responsible for the technology that amplifies the mission effectiveness of the CFPB.
The CIO oversees the following eight functions, all of which support the Bureau’s mission or operations:
We are looking for a broad and diverse pool of candidates to apply for the position. Strong candidates will have 15+ years of professional experience from the government or private sector (combination of private/public sector experience is highly desirable). Strong candidates will also have exceptional experience in at least two of the three following areas:

The posting for this position has now closed. Thank you for your interest!
The Consumer Financial Protection Bureau (CFPB) is an equal opportunity employer and seeks to create and maintain a vibrant and diverse workforce. Women, minorities, veterans, and people with disabilities are encouraged to apply.
]]>The README Refresh Day helped us improve the quality of our open source software and its ability to be used and adopted by the general public. This effort effectively helps to pave the path for participation and collaboration, and helps to build a community around software. These are key tenets in open source software development, the presence of which helps to ensure software quality and success. The README Refresh Day also presented a number of added benefits: (1) It helped orient new developers to technical environments and processes, and (2) it helped to reinforce an important cultural value: Empathy – empathy for other developers, or anyone in the public who would want to understand our software products. The easier our software is to understand and develop, the faster and more effectively we can deliver value to the American public.
Collaborators on software projects are frequently encouraged to write good installation and usage documentation and keep that information up to date, but even on the strongest of teams, creating and maintaining correct, up-to-date documentation is a real challenge. The documentation often:
A small group of existing CFPB developers organized the event, including planning and event-day activities, documented at https://github.com/cfpb/readme_refresh_day. We scheduled it during the fellowship immersion period as an activity to bring new developers closer with existing developers, and also to provide all developers – veteran and newcomer – with an opportunity to explore software created by other CFPB project teams. This also provided veteran CFPB developers an exercise in humility and empathy, as they watched other developers attempt to install and work on software, which they had created previously, using just the README.
Our raw notes can be read at https://github.com/cfpb/readme_refresh_day.
During our three-hour session, 25 developers worked on software they had never seen before, using just the README. In the group intro discussion, our new developers contributed valuable insights into what constitutes a good README, and these suggestions will become part of our open source template. Our developers shined a light on out-of-date or completely missing instructions and assumptions made by previous authors. We also looked for missing or broken “badges,” such as Travis CI and Coveralls, and added or fixed those.
Additionally, it was an excellent opportunity for new fellows to get a hands-on introduction to some of CFPB’s open source software projects. If those developers who haven’t seen the software before couldn’t figure out how to install or use it from the README, they had the opportunity to dig into it (albeit briefly) and try to determine what might be missing from the README. At the end, even if they had no answers, at least they had a much better idea of the questions they needed to ask.
The session resulted in 14 pull requests to nine of our open source projects on GitHub, and 21 issues added to these repositories for questions, comments, or enhancement suggestions. We also came away with a humbling sense of the differences in README quality amongst our repositories, and we’ll use this learning to improve across the board.
If you want your developers to learn more about software with which they’re unfamiliar, develop empathy for new users, and improve the welcome mats to your projects, hold a README Refresh Day.
For an activity that requires only a day or two of planning, the results in both teamwork and software/documentation improvements are impressive.
]]>As discussed in our recent article about holding an Iconathon, the design team at the Consumer Financial Protection Bureau has been developing one of the first open source icon libraries released by the federal government. Our collection, called “Minicons”, consists of small-scale icons that visually reinforce an interface action, file type, status, or category. Designed with simplicity in mind, these icons enable users to quickly find and scan content with ease.
The release of the Minicon library reaffirms the Bureau’s commitment to open source by broadening its interpretation to include design assets that can be universally utilized. This opens new doors for government to play a vital role in a growing open source community that seeks to empower the public with information, tools, and resources.
As a newly established agency, the opportunity for new ideas and novel approaches to technology influenced the evolving CFPB visual identity. Outreach efforts to communicate with consumers and industry took precedence, creating a need for developing new icons. Given this demand, we reconsidered the management of our existing icon library to streamline its use for print and web-based projects. In keeping with the CFPB ethos of developing unique resources that cater to project needs, we sought to reinterpret our library of icons from a loose set of individual graphics to an easily accessible font. This offers many advantages to our Design & Development Team, including: smaller file size, vector scalability, customizable size and color, cross browser compatibility, and improved access to our icons through the glyph panel within Adobe design software.
While the end product is valuable in and of itself, the successful adoption of a transparent and open design process is another goal of ours. Inspired by the common practices of open source developers, our team of designers has worked towards creating a culture shift that welcomes a broader audience into our design process. By leveraging GitHub – a platform primarily created for sharing code – we have established an open critique of visual designs, as illustrated in our newsroom Minicons discussion. Public participation is encouraged and all disciplines are welcome to contribute helpful suggestions. To participate, visit our Design Manual repository on GitHub and provide feedback to any of our open issues.
Although open source code as a new mode of operation has been readily embraced by both government and the private sector, designing for the public domain is still in its nascency. One organization providing a venue for open design is The Noun Project, an online repository of icons that provides designers with a platform to share their work and dedicate it to the public domain. In line with this objective, we have contributed our icons to this growing library, demonstrating our commitment to open source within the design community. We look forward to continuing our efforts in promoting these modalities of transparency that are not yet common practice among graphic designers. It is our hope that openness becomes part of the design vernacular, giving way to greater connectivity and sharing of ideas.
If you would like to access our Minicon library, it is publicly available as an icon font from the CFPB Design Manual and as individual vector source files from The Noun Project.
]]>
The new booklet from CFPB was designed in two sizes.
The CFPB has released a new booklet to help Americans navigate the process of shopping for a mortgage and buying a house. The booklet aligns with the new Know Before You Owe mortgage disclosure rule and includes new information about how to use the Loan Estimate and Closing Disclosure forms. It will reach consumers applying for certain mortgage loans on October 3.
One of the primary goals of rewriting and redesigning this booklet, called Your Home Loan Toolkit, was to create a resource that would invite consumers to open, read, and learn from it. With this booklet, the CFPB has a chance to directly deliver unbiased and understandable information to people at a key moment in their financial lives. It was designed by the in-house team at CFPB to be consumer-friendly and fully accessible. The new booklet is written in plain language and features a clear, clean design. We ran usability tests with consumers and their feedback informed the final design. And, taking mortgage industry practices into consideration, we designed three formats of the booklet so it will reach more Americans without being redesigned by lenders.
The toolkit replaces an existing booklet which was initially developed by the Department of Housing and Urban Development (see an earlier HUD iteration here). Creditors must provide the toolkit to home purchase mortgage applicants, potentially reaching millions of Americans each year. We completely rewrote the booklet and reduced the page count from 71 pages to 25 pages while incorporating many new statutory requirements. The new design has a simple structure and clear typographic hierarchy that flows logically and guides the reader through the content. Headers, large type sizes, and white space clearly organize the information. The toolkit features interactive worksheets and checklists, conversation starters for discussions between consumers and lenders, and research tips to help readers seek out and find important information.
The new toolkit features a two-page worksheet to help consumers calculate what they can afford.
We designed the cover to catch consumers’ attention, appeal to the end goal, and reach all types of homebuyers. We went through many iterations but ultimately settled on the simple moment of getting the keys to your new home.
As we designed the booklet, we completed three rounds of qualitative usability testing in person, across 12 different cities. For all the sessions, we recruited consumers who met specific demographic and socioeconomic criteria, and were first time homebuyers who had recently taken out a mortgage or were planning to apply for a mortgage soon. We designed the first round of testing, in August 2014, to inform the overall concept of the booklet. We tested six-page sample booklets using three general ideas: a version that broke sections into time-boxed tasks, a version that focused on present and future goals, and the toolkit version.
The three covers tested in round one, representing different booklet concepts.
Elements of each concept ended up in the final design,
but the nine consumers consulted preferred the toolkit concept.
Next, in October, we wrote and designed the full toolkit
and interviewed six new homebuyers, seven lenders, and nine settlement agents
to provide direction on accuracy, completeness, and understandability of the booklet.
Finally, in January 2015, we validated the new booklet against the old booklet
with 33 consumers, to make sure that it was working better.
During this these sessions,
more than 90% of consumers preferred the new version of the booklet.
Under Regulation X, which implements the Real Estate Settlement Procedures Act, lenders may use any color, size and quality of paper, type of print, and method of reproduction so long as the booklet is clearly legible. Lenders are also permitted to change the cover design. In the past, vendors who supply lenders with the old booklet have reformatted it to fit into a small business envelope for easy and cost-effective mailing. These versions of the booklet used a very small type size and limited margins and white space. They were printed on thin paper using only two colors. Because we want people to get this information with optimized design and typography, we made a smaller version of the booklet that matches the two-color, business-envelope size specification. Redesigning the booklet for a smaller size was just like designing a website for desktop and mobile screen sizes. We hope that lenders and their vendors who want to save on mailing costs adopt this version instead of redesigning their own.
Earlier versions of the booklet reformatted by vendors featured small type sizes, tight margins, and little white space.
Though the majority of booklets are still delivered in print, many are now delivered electronically. We created an electronic version complete with fillable text fields and interactive check boxes so consumers can save and print their progress as they work through the toolkit. The electronic version meets federal accessibility standards to ensure that all Americans, including those with disabilities, can use the resource. To meet the standards, we identified content by how it is used within the document, set the reading order and tabbing order to flow logically, added internal navigation and links, included proper metadata, incorporated accessibility-friendly design patterns, and added descriptive text to all non-text content—like images and form fields.
Finally, we produced Spanish language versions of the large print-ready and 508-compliant PDF formats of the booklet.
You have the right to clear information before committing to a mortgage, possibly the biggest financial decision of your life. People with disabilities and Spanish language readers should have equal access to this information. We think you will benefit from the new toolkit’s plain language and organized design to help you navigate the complicated process of shopping for a mortgage and buying a home. Our online companion to the toolkit, Owning a Home, will help you go even deeper into the home-buying process. Please share the toolkit with friends and family who are planning to buy a home, and leave feedback on the design below.
]]>
On the right, the new Loan Estimate form. On the left, a web page
that helps to explain it.
Starting October 3, homebuyers will see two new forms during the mortgage process. One, the Loan Estimate, will help consumers understand the estimated costs and conditions of every mortgage they consider. The other, the Closing Disclosure, will give consumers a final overview of their mortgage terms at closing.
Both forms were designed by the CFPB to be clear, concise, and consumer friendly. We put the forms through several rounds of user testing to make sure the average homebuyer can understand the often complex terms and calculations shown on both forms.
But we know that no form can possibly answer every question homebuyers will have about their mortgage, so we went one extra step. Both forms feature a URL that homebuyers can visit to get help understanding the forms.
The URL on a paper Loan Estimate form.
Those URLs point to a set of tools that help answer consumers’ questions about the forms. These tools went through a design and usability testing process, and we’re super excited to launch them right alongside the forms on October 3.
When we started designing these tools, we had a very clear use case in mind: a homebuyer with a form in one hand and a mouse in the other.
A
homebuyer looking at both the paper form and the web page that explains it.
Though the specifics of the situation might change (the form might be a PDF open in a browser tab or the mouse might be a tablet), the big idea is the same: most of our users are going to be looking for help with a specific part of their form by looking for a specific part of their form.
That’s why we designed the interface with the forms themselves front and center on screens of all sizes.
Thanks to previous work on the Owning a Home project, we had lots of research to guide us on what parts of the form might be confusing and what homebuyers need to watch out for (e.g., making sure the address listed on the form is exactly correct). Those areas are all highlighted in the Loan Estimate and Closing Disclosure tools, and a quick click or tap pulls up info that helps answer homebuyers’ questions.
The Closing Disclosure tool also highlights areas of that form that homebuyers should compare to their original Loan Estimate form to see if any mortgage costs or conditions have changed (the forms were specifically designed to be compared side by side). The tool helps homebuyers understand what changes are allowed under current mortgage regulations.
We tested the tools with potential homebuyers to make sure our design made sense. Those users helped us see issues we wouldn’t have spotted otherwise, like how confusing it was to lump simple definitions of terms on the form in with more complex explanations of what to watch out for. That kind of usability testing data let us continuously refine the tools’ interface.
The universe of forms that need explaining isn’t limited to Loan Estimates and Closing Disclosures. Those may be the particular forms we’ll be explaining on October 3, but we built the technology to be flexible enough to handle just about any kind of form.
There are three things needed to explain most paper forms on the web: an image of each form page, content explaining the different parts of the form, and a map that ties the part of the form to be explained to the correct piece of content. Together, those three things can create an interface that shows the form with hotspots positioned over each part to be explained. Tapping or clicking those hotspots shows the associated content.
Our tools use an HTML file for each page of the form to be explained. That file
contains only a data object with two properties, img and terms. For
example:
data = {
"img": url_for('static', filename='img/form-page.png'),
"terms":
[
{
"term": "Estimated Closing Costs",
"definition": "<p>Upfront costs you will be charged to get your loan and
transfer ownership of the property.</p>",
"id": "estimated-closing",
"category": "definitions",
"left": "6.95%",
"top": "85.20%",
"width": "23.26%",
"height": "4.39%"
}
/* The rest of the terms being explained */
]
}
The img property is just a path to the image file of the form page being
explained. The terms array is where the real action happens. It contains all
the information needed to explain a particular part of the form, like the name,
definition, and position of the explained part.
Structuring explainer pages like this gives a couple of benefits. First, all the content is in one chunk, so pages can be created and edited without digging through masses of HTML. Even better, though, is how the simple structure of the data lets non-technical folks write and tweak definitions without needing to understand anything more than very simple HTML.
We’re excited to launch our new Loan Estimate and Closing Disclosure tools on October 3. We’re just as excited about the future of the form explainer concept. Like much of our work at the CFPB, the code for these tools is open source and free for anyone to use or adapt for their own projects. The code is all on GitHub, and we’d love to see it used and improved.
]]>The Owning a Home site demystifies mortgage jargon, provides mortgage shoppers with detailed information on loan options, and makes a closing checklist and guide to closing forms available for download to help them at the closing table. Our previous site helped consumers learn the range of interest rates they can expect. What was missing was a clear context for all of these tools: when should home buyers access them, and what else should they be thinking of when interacting with all this information?
During our interviews with homebuyers, we found that consumers don’t always have a clear understanding of the process of buying a home. They desired clear action steps and milestones along the way, from preparing to shop for a mortgage to closing.
Owning a Home now addresses this need by presenting an overview of the home-buying process, broken down into four phases. Users can navigate the Owning a Home site depending on which phase they’re at in the process: preparing to shop, exploring loan choices, comparing loan offers, and getting ready to close. Each phase contains three main goals and clearly defined action steps to accomplish those goals, helping home buyers understand and move confidently through the process to own a home.
The four phases help consumers as they move through the process:
With this guide, we had three main goals:
These goals presented a few challenges for our designers. The home-buying process is complicated—what’s the right balance between providing enough information so that consumers can be successful without overwhelming them? How do we make it easy for consumers to figure out where they’re currently at in the process? We don’t expect all or even most of our users to come to our website at the very beginning of their home buying journey. And how do we keep this action-oriented, so people can use this guide to take concrete steps on their path toward owning a home?
These were the questions we considered as we designed the first iteration of the tool. We then shared our first design prototype with people currently buying a home to get feedback—and see whether we were successful in meeting our goals.
First iteration of a phase page layout.
Our early usability testing revealed that our prototype did not include enough information for users to make sense of the process. Users requested more information about what the action steps meant, and wanted our tools and resources to be connected to specific action steps. We also discovered that our navigation wasn’t successful—it took users a long time to notice the phases of the process because we had placed them at the bottom of the page.
With these problems now defined, we went back to work. We made the action steps expandable, with additional information, so that consumers could selectively dig into the details without being initially overwhelmed. We embedded tools and links into the action steps to connect users with the extensive resources the CFPB has to offer. We duplicated the navigation on the top and bottom of the page, to make it findable and reinforce where a user is currently located within the process.
Final design for a phase page.
Additional content in the action steps.
This new tool will help home buyers throughout the process of buying a home: from the very start through to closing. Phases and action steps help make sure consumers don’t miss anything important, and our interactive tools give home buyers the information they need to feel more confident throughout the entire home-buying process. In addition to this new tool, we’ve added other tools to help consumers use the new Loan Estimate and Closing Disclosure forms and written new content to help consumers use these resources more effectively. Visit the updated site here!
]]>“When I first applied for the loan modification, I wasn’t behind, so they told me that my chances of getting a loan modification weren’t good, that they needed to see me having some difficulty, maybe being behind a few months, showing that I was having difficulty maintaining my mortgage. Well I did that, ended up getting three months behind and that’s when the notice of foreclosure letters started to come. I just became very disappointed in the whole process… I wish there was more information about how it works.”
We heard this during a usability test in Memphis, Tennessee, last December. I work as a User Experience (UX) Designer for the Consumer Financial Protection Bureau (CFPB), and part of our mandate as a federal agency is to accept complaints and other feedback from the public about financial products and services. My team was in Memphis testing a new prototype for our online consumer complaint form. While we received positive feedback on our design work, our participants’ ongoing struggles made it incredibly clear that we had a lot more work to do.
The mission of the CFPB is to help make consumer finance markets work for Americans by making rules more effective, by consistently and fairly enforcing those rules, and by empowering consumers to take more control over their economic lives. From the Bureau’s founding in the wake of the 2008 financial crisis, it has always been clear that we need to leverage design and technology in order to achieve these goals.
I joined the Bureau in 2013 as part of our first class of Technology and Innovation Fellows, which is a program that our early technology team designed in order to take advantage of design and technology talent outside of Washington, D.C. Fellows serve two-year terms from anywhere in the country with occasional travel to our headquarters in D.C. Our first class of 30 designers and developers quadrupled the size of the existing Design & Development Team and radically changed our ability to serve the Bureau. We continued to grow throughout 2013 and 2014, including the hiring of our 2015 class of fellows, who started in January.
As the larger design and development group has grown, our group of UX designers has been working on the best ways to integrate user research, content strategy, and user-centered design into the Bureau’s initiatives for rulemaking, supervision, enforcement, and consumer engagement.
When I arrived two and a half years ago, employees across the organization already had a basic understanding of and appreciation for user-centered design, prototyping, and usability testing, so stakeholders and subject matter experts were already interested in participating in the design process. We didn’t need to “sell” the concept of user-centered design; our role has been to take it to the next level.
Leisa Reichelt, Director of User Research for the UK’s Government Digital Service, wrote about the importance of this shared ownership of user experience in her article “There is no UX, there is only UX”:
It is really important that no one in the team can point to someone over in the corner and put all the burden of user experience on that guy. No one person, no small group of people can be made responsible for the user experience of a service. It is down to the entire team to achieve this, and we need to drag people into the team who make decisions way before we get on the scene.
Our UX designers, researchers, and content strategists add to this base understanding of UX through many means. We facilitate working sessions, wireframe designs, diagram complex systems, communicate functional specifications, plan user research, and work to maintain consistency across our products, platforms, and breakpoints. Jared Spool, founder of UIE, recently wrote about this balance in “Hiring UX experts versus giving your team their own UX skills”:
UX is a complex field because there are lots of complexities: accessibility, mobile design, working in government agencies, working in an agile development process, and dealing with cross-cultural needs, to name a few. Yet, these are edge cases. 80% of UX work is quite routine and when learned by everyone on the team, creates consistently good designs.
What this means for our UX designers:
In four years, the CFPB has proposed new rules for mortgage servicing, prepaid cards, payday loans, money transfers, and debt collection. We’ve established partnerships with schools, libraries, and financial counselors to help educate and empower consumers. We’ve helped return over $10 billion to the American public as a result of our enforcement actions. As our team of designers and developers has grown, we’ve also launched several products and services to help advance the mission of the Bureau:
For the remainder of 2015, we’re focusing on several initiatives:
Our Design & Development Team manages this work by empowering small, cross-disciplinary teams to collaborate remotely and get things done. Our UX designers work alongside graphic designers, front- and back-end developers, data specialists, and subject matter experts every day, with some of them serving as the scrum master or product owner for their project team. We meet as cross-project “feedback houses” to help spread information among designers, each project team presents their recent work on Friday afternoons, and we use our publicly-available Design Manual to maintain design and brand patterns across projects.
The next phase in our growth is hiring an experienced manager to lead our disciplines of interaction design, user research, and content strategy as a mentor, supervisor, technical expert, and visionary advocate for user-centered design. The UX Lead position will have an option for working remotely outside of Washington, D.C., though it will require a significant amount of travel to our headquarters as well as other locations where we conduct user research. We want to find the best possible candidate to help our team grow, so we’re conducting a national search.
We hear from hundreds of American consumers every day who believe that we may be able to help them find a solution to problems that are crippling their day-to-day lives. Designers are a part of the solution here. Please join us.
To learn more about the CFPB’s UX Lead position, email us at [email protected].
]]>This past January, as part of our introduction to the new class of Design & Technology Fellows, we hosted our inaugural iconathon, a half-day workshop to brainstorm and sketch new icon ideas. Subject matter experts from throughout the Bureau overcame their fears of sketching and joined our user experience and graphic designers to generate ideas for nearly forty financial concepts. They were tasked with representing challenging topics like “payday lending,” “getting out of debt,” and even the seemingly insurmountable “resources for intermediaries.”
![]()
By the end of a morning packed with sharpies and post-its, teams had sketched nearly 400 different ideas. We held a critique to discuss which approaches provided the best conceptual representation rather than visual polish. Favorite sketches included solutions for “credit card fraud,” “digital currency,” and “building credit.” The leading approaches were recorded for our design team to execute as final icons for our library.
![]()
Our new fellows and subject matter experts provided a fresh perspective on many of these complicated topics. For our new fellows, the iconathon was an opportunity to get a sense of our design aesthetic, as well as the general culture of design collaboration and critique at the Bureau. It also gave them a chance to work with our subject matter experts and begin building relationships for future projects.
For our subject matter experts (non-designers), the iconathon also enabled them to broadly engage in the design process itself. Many shared that the experience of stepping back and discussing topics conceptually was refreshing, proclaiming “I wish I could do this every morning!” We hope it sets a precedent for future events, potentially as a way to collaborate across different agencies and directly with the American public.
Our entire icon library is available on our public Design Manual, and we’re excited to share that all our icons designed-to-date can be downloaded from the Noun Project! Please checkout our Design Manual repository on GitHub if you’d like to get involved by suggesting new icons, contributing designs of your own, or learning the history behind the design of our icons.
![]()
Our first version of the feature included a set of decision tree questions and several scenario-based information modules. We knew this would help consumers right away, but we also knew we’d need to make improvements as we received user feedback. We reviewed usage data and user feedback and made the following improvements in time for the end of the school year:
According to usability testing, the Repay Student Debt feature was effective but presented significant usability challenges. We learned that the front page didn’t give users a clear understanding of the features, and the question-and-answer pathway was confusing. In addition, we ran our own analysis of the content layout and found that we weren’t clearly directing users to specific action steps they could take after using the tool. For example, our web traffic data showed that 40 percent of users ended their experience at a module about income-based repayment, a student loan repayment program that allows borrowers to limit the amount they must repay each month based on their income. However, few users took the next step and clicked through to the repayment program website where they could enroll. Our improvements make these kinds of next steps clearer.
While we’re excited to see how all of the changes will create a better experience for consumers dealing with student loan questions, the centerpiece of our update is the new decision tree interface.
In the original version of the tool, each question disappeared once it was answered, leaving users feeling disoriented and uncertain. During usability testing for that version, users asked questions like, What are all these questions about? What’s the relationship between this question and the one I just answered? Wait, what if I want to change my answer to a question?
In addition, the actual question text was often removed from the answer options by a block of informational text. Our testing showed that users got bogged down or exited in search of further details, failing to reach the helpful information at the end of the decision tree.
We made two critical user interface changes:
Focus on the question and answers by reducing and moving any additional information to the right side. An example decision tree question:
Before:

After:

Keep the answered questions on-screen, allowing users to see what information they’ve provided and to easily change any answers if necessary.

A better experience for consumers using our tools and better code go hand-in-hand. Not only do we believe this new decision tree interface is easier for consumers to use, it’s also easier for our developers to maintain, thanks to a modular plugin called Decision Stacker that we developed internally.
To make it easy to change the decision tree in the future, Decision Stacker maps every button (and potential decision) to a simple JavaScript object. By changing the object, a developer can change the flow of the decision tree. The object also powers Decision Stacker’s unique “Change Answer” option that is present on every question, allowing a user to essentially “rewind” their choices to any previous decision point. As we test Decision Stacker further, we’ll release the code on GitHub so others can use it and contribute to it, too.
We hope you check out the improved Repay Student Debt feature on Paying for College and let us know what you think in the comments below.
]]>
Comparison shopping with two proposed short-form disclosures for prepaid cards.
On November 13, 2014, the Consumer Financial Protection Bureau proposed new disclosures for prepaid accounts that will provide consumers with clear, upfront information about the associated costs and risks. These new disclosure requirements are part of our larger proposal to extend many federal consumer protections to prepaid products.
Prepaid accounts are typically loaded with funds by a consumer or by a third party, such as an employer. Consumers can use these products to make payments, store funds, get cash at ATMs, receive direct deposits, and send funds to other consumers. The amount of money consumers loaded onto general-purpose reloadable prepaid cards grew from less than $1 billion in 2003 to nearly $65 billion in 2012.
Currently, each prepaid card company’s retail package discloses different information in different ways. This can be confusing if you’re trying to compare costs between prepaid accounts. When the Bureau spoke with focus groups about prepaid cards, few participants reported doing any formal comparison shopping before purchasing a prepaid card in a retail store.
CFPB’s in-house design and user experience team spent the past year working with our colleagues in Research, Markets, and Regulations to design standardized model disclosure forms that would enable consumers to make more informed choices among prepaid options. We prepared designs and advised three rounds of usability testing, facilitated by ICF International, which informed the design that the Bureau proposed in its Notice of Proposed Rulemaking.
To reduce information overload on consumers in a retail setting, the proposed rule includes a short and long version of the proposed disclosure. The short-form disclosure would present consumers with a reduced, manageable set of fee and account information, while the proposed long-form disclosure would contain all of the fees and conditions that apply to the account.
In a retail store, the proposed short-form disclosure is designed to appear on the product’s external packaging and consumers would be able to access long-form disclosure information before acquiring the account by visiting a website on their mobile phone, or by calling a toll-free number.
We used a limited set of design tools in the proposed short-form disclosure to fit in a small space (3.5-inch square) similar to the industry’s current package design. Our aim is to improve the consumer shopping experience through a clear and simple layout, active language, and mobile-friendly access to the long-form disclosure. We designed the proposed disclosure with a single, easy-to-read typeface that is set at minimum font sizes to aid legibility. The proposed disclosure presents fees in two levels of hierarchy with what we believe to be clear, concise language. Horizontal lines are used to direct the eye and whitespace eases reading.

The proposed short-form disclosure design.
The results of CFPB’s usability testing informed many design choices on the proposed disclosure. Listed at the top of the proposed disclosure, in a larger, bold font, are the fees that testing participants identified as being most important to them when shopping for a prepaid account, and that Bureau staff identified as important through its market analysis.
In an earlier iteration of our design, we found that consumers struggled to interpret text that included slashes. For example, the design we proposed describes what we previously referred to as “in-network/out-of-network” as “in-network or out-of-network.”
For the same reason, asterisks and fine print appear sparingly. Early versions of the short-form disclosure design included many conditions for various fees in fine print. As we refined the design and incorporated consumers’ feedback, we decided to propose that financial institutions be required to disclose the highest fee possible, along with a single symbol indicating that the fee could be lower (if applicable). Consumers would be able to reference the long form for the details.
Because the short-form disclosure would not include all fees, we added to the proposal two elements to lessen the risk of hidden fees. First, we left space for a few additional fees that may vary from product to product. These fees represent those most frequently charged to consumers for that particular product. Second, the proposed disclosure would indicate the number of additional fees associated with that product, which, under the proposal, would all be included on the long-form disclosure. We think that this will give consumers a sense of control without feeling overwhelmed in a shopping environment.
We also identified a few suggested requirements that we believe will improve consumers’ in-store access to the electronic long-form disclosure. The proposed rule would require that the URL to the long-form disclosure be less than 22 characters and be meaningfully named. Under the proposal, the electronic long-form disclosure would have to be machine-readable so that consumers can read and retain it using the device that is best for them. In other words, it shouldn’t be presented as a static image.

Reviewing the proposed electronic long form in the store with the proposed short-form disclosure.
Each member of the Bureau’s prepaid accounts team—lawyers, economists, project managers, researchers and designers—brought different strengths and experience to the project. Marrying that expertise with design best practices and technology standards greatly informed the rule-making process. In designing the solution the Bureau has proposed, it was a challenge to maintain a clear and simple visual hierarchy for so many competing pieces of information. We often asked ourselves, “Is this more important than that?” and reminded ourselves that if everything is emphasized, nothing is emphasized.
We hope that our role in designing these proposed disclosures will ultimately help improve the consumer’s experience when shopping for a prepaid card. Take a look, and tell us if you think the proposed design does a better job of disclosing fee information compared to fee information you’ve seen on prepaid card packaging. You can submit a comment until March 23, 2015. We look forward to seeing new disclosures in stores after any final rule is written and goes into effect!
]]>I am pleased to announce that our 2015 Technology & Innovation Fellows have started at the CFPB. We had over 1,000 applicants, months of résumé reviews, and a series of interviews to bring in the 2015 class, which includes 26 new fellows. Many of them are joining the federal government for the first time in their careers. Each new team member will have ample support, as they will be joined by 15 returning fellows from our 2013 class who have signed up for a second two-year term at the CFPB. Our returning fellows represent just over half of our previous class; the others have gone on to new opportunities in the tech community, including with other government agencies, in teaching positions, and in the private sector.
The 2015 fellows come from diverse backgrounds. Some come to us from the private sector, others from government service, and still others from NGOs. One left a job where he programmed robots in an artificial intelligence lab. Some have fairly standard backgrounds in computer science, while others are self-taught developers, and some have advanced degrees in areas like astrophysics and environmental science.
Our new team members are joining the CFPB because they have something in common: They’re excited about our mission and about changing the way technology works in government. For some, this means publicly releasing data to make the market work more efficiently; others are excited to apply user-centered design to financial decision making. Some care about adopting development methodologies that can help government be more nimble, while others want to provide the best tools to help the CFPB’s talented workforce.
Thanks for all of the interest in our program, and congratulations to the new fellows. We’re really excited to see the impact they will have at the CFPB.
Ashwin Vasan is the Chief Information Officer of the CFPB.
]]>With the support of the tech fellows, this year the Bureau launched an easy-to-use online tool that enables consumers to explore public information about the mortgage market. This set of web-based JavaScript applications run on top of an API we built using our own Clojure-powered tool.
In January 2014, the platform launched. This platform supports the Home Mortgage Disclosure Act (HMDA) data from 2007–2012, representing roughly 113 million rows of deidentified mortgage loan application records. Mortgage lenders have collected HMDA data since 1975, but the data was only available online in raw form, limiting its usefulness. This new platform allows users to filter and aggregate the data in ways previously unavailable because of the size and complexity of the data. The data is now easier for the public to access, navigate, and analyze.
We continue to make improvements to the platform, and we’re excited about its potential to make information easier for the public to access and use. That’s where you come in.
Qu (https://github.com/cfpb/qu) is our open source software, written in Clojure, and it is the heart of the Public Data Platform. To continue our important work, we need talented Clojure developers.
Some of our high-priority tasks are:
As a Technology and Innovation Fellow, you can work from anywhere in the U.S., including on-site at our headquarters in Washington, D.C. We put a premium on communication and collaboration, and we’ve learned a lot about successful remote working working. You’ll join a team of creative, dedicated developers and designers scattered around the country, and we’ll help you stay connected.
If this mission appeals to you, and if these software challenges excite you, please apply at http://www.consumerfinance.gov/jobs/technology-innovation-fellows/. If you’re not a Clojure developer, but all of this sounds fantastic, take heart: We’re also hiring for talented Python and JavaScript devs, graphic designers, UX and more, and we encourage you to apply.
]]>We’re looking for talented individuals with diverse backgrounds who embrace our mission and are excited about building technology and helping to build our organization. We expect the next group of fellows to begin work in January 2015.
Since the program launched two years ago, fellows have been hard at work applying their talents to build amazing things to help financial products and services work for consumers. Today, I’m proud to share with you some of their work.
Fellows have been instrumental in creating and building:
Looking ahead, the next round of fellows will continue to build on these accomplishments as well as tackle new projects in areas such as building software for our website, developing consumer-friendly tools and materials, and supporting agency cybersecurity functions.
Technology and innovation are fundamental to our ability to achieve our consumer protection mission. If you’re ready to serve the public and help us build amazing things, apply now or sign up here.
Want to learn more? Check us out on GitHub or peruse the rest of this site to learn more about the web applications our current fellows have developed, and check out our Design Reel to see how current fellows have improved the ways consumers interact with the federal government.
]]>The first and foremost difference between the two regulations is that Regulation Z is significantly larger than Regulation E in almost all aspects. In Table 1, you can see the difference between Regulation E and Regulation Z for each type of content. For example, Regulation E has 26 sections while Regulation Z has 54.
| Regulation E | Regulation Z | |
|---|---|---|
| Subparts | 2 | 7 |
| Sections | 26 | 54 |
| Appendices | 2 | 15 |
Table 1: The number of each type of content per regulation.
Regulation E is 1.5 MB on disk, while Regulation Z is almost ten times larger at 11 MB, when both texts are represented as a pretty-printed JSON trees (not including images). That’s a lot of text; in comparison, War and Peace by Leo Tolstoy is 3.1 MB. The fact that Regulation Z is a significantly longer regulation than Regulation E drove how we approached the updates and improvements to the tool – from the need to automatically retrieve content changes, to allowing additional types of appendices, to separating the supplement into more manageable chunks.
A primary feature of eRegulations is the ability to view past, current, and future versions of a regulation. Previously, the source content that was fed to the parser to generate each version was created manually. The most significant change we made over the past six months was to automate this process.
Each version of a regulation consists of a series of Federal Register (FR) final rule notices applied to the previous version of the regulation. Each notice describes changes to individual paragraphs of the regulation (think of it like a diff). A change can add, revise, delete, or move a paragraph and looks something like this:
- Section 1026.32 is amended by:
- Revising paragraph (a)(2)(iii)
- The revisions read as follows:
- (a) ***
- (2) ***
- (iii) A transaction originated by a Housing Finance Agency, where the Housing Finance Agency is the creditor for the transaction; or
This example is from https://www.federalregister.gov/articles/2013/10/01/2013-22752/amendments-to-the-2013-mortgage-rules-under-the-equal-credit-opportunity-act-regulation-b-real#p-amd-32
Lines 1 and 2 describe which paragraph has changed, and how it has changed (known as the amendatory instructions). Line 6 shows you how paragraph 1026.36 (a)(2)(iii) reads after the revision. A notice can contain multiples of these changes.
Each version of a regulation on our platform is represented on the back end as a data structure (more specifically an ordered n-ary tree) that represents the entire regulation at that point in time. For each version of Regulation E, we manually read each FR notice and meticulously compiled plaintext versions that were fed to our parser to generate the tree structure. This was possible since in Regulation E we have three versions consisting of eight FR notices. Regulation Z, on the other hand, has 12 versions and 23 notices. Manual compilation of versions would be inefficient and more prone to error. It also is not a sustainable solution going forward. We wanted to be able to simply start the parser when the next Regulation E or Z notice was published – without having to manually apply the changes from the new notice.
We now automatically compile regulation versions. Each FR notice is processed by parsing the amendatory instructions (what has changed) and the actual changes (how it has changed), matching those up, and compiling the changes into a new version. Each FR notice has a corresponding XML representation – this also drove the conversion of our parser from being text-based to XML-based. This resulted in a far more sustainable application requiring less manual intervention to add an additional regulation.
An individual regulation paragraph can change in a limited number of ways. A paragraph can be added, revised, moved, or deleted. Usually, these changes are written with reasonably consistent phrasing – making parsing them tractable. However, sometimes there are exceptions when the change is not expressed as clearly as possible. Adding rules to the code for these exceptions would have diminishing returns in the sense that the effort of getting the code correct, tested and ensuring that it doesn’t break any of the other parsing, would far outweigh the benefits of the unique rule. To handle those special cases, we built a mechanism to allow us to keep local copies of the XML notices taken from the Federal Register and make changes to that copy to make it easier to parser. The parser looks first in our local repository of notices to see if a copy of a required notice exists, before downloading it from the Federal Register. This enabled us to gracefully handle phrases that aren’t used frequently enough to warrant their own custom rule.
The same mechanism came in handy when we discovered that several notices for Regulation Z had more than one effective date. Notices with the same effective date are what comprise a version of a regulation. The following example illustrates how complicated this can get:
This final rule is effective January 10, 2014, except for the amendments to §§ 1026.35(b)(2)(iii), 1026.36(a), (b), and (j), and commentary to §§ 1026.25(c)(2), 1026.35, and 1026.36(a), (b), (d), and (f) in Supp. I to part 1026, which are effective January 1, 2014, and the amendments to commentary to § 1002.14(b)(3) in Supplement I to part 1002, which are effective January 18,
In these cases, we manually split up the notices, creating a new XML source document for each effective date. This was another situation in which a manual override made the most sense given time and effort constraints.
The types of information the appendices for Regulation Z contain are far more varied than those for Regulation E. First, the structure of the text in the appendices for Regulation Z differs from that of Regulation E. This required a complete re-write of the appendix parsing code to allow for the new format. Secondly, the appendices for Regulation Z contain equations, tables, SAS code, and many images. Each of those presented unique challenges. To handle tables we had to parse the XML that exhaustively represented the tables into something meaningful and concise, and then display that in visually pleasing HTML tables. The SAS code was handled by the same mechanism.
Some of the appendices in Regulation Z contain many images. To speed up page loads for those sections we re-saved all of the images using image formats that compress the content with minimal quality degradation and introduced thumbnails. Clicking on the thumbnail brings the user to the larger image, but the thumbnails ensure that pages load faster. We also lazy-load the images on scroll to speed up the initial page load. Regulation Z, in its original form, also contains a number of appendices where the images contain text. We pulled the text out of those images, so that the text is now searchable and linkable providing for a better user experience. With the exception of compiling regulations, most of the changes we made for Regulation Z were directly a result of that fact that Regulation Z is longer.
Supplement I is the part of the regulation that contains the official interpretations to the regulation. Loading Supplement I as a single page worked well for Regulation E (where the content is relatively short) but with Regulation Z this led to a degraded experience as the supplement is significantly longer. We split Supplement I, so it could be displayed a subpart at a time. Displaying the interpretations a subpart at a time was considered a more cohesive experience by our product owner (rather than breaking Supplement I to be read a section at a time). Our code was previously written with the intent of displaying a section at a time (with the entirety of Supplement I considered as a section). This worked nicely because that also reflects how the data that drives everything is represented. With the Supplement displayed a subpart at a time, there is no corresponding underlying data structure that tells us that the following sections of Supplement I should be collected and displayed together. This required a rewrite of some of our display logic. Supplement I is now easier to read as a result.
We made many other changes to the eRegulations tool along the way: introducing a landing page for all the regulations, extending the logic to identify defined terms with the regulation, and based on user feedback – introducing help text to the application. Each one of those represents a significant effort, but here we wanted to explain some of the larger efforts. All our code is open source, so you can see what we’ve been up to in excruciating detail (and suggest changes).
Through this set of changes, we’ve hopefully made it easier to navigate, understand, and comply with Regulation Z. Going forward it will also be easier to add future regulations and deal with longer regulations.
]]>To start, what is Qu? Can you give some of the backstory on why we built it?
Qu is an open-source platform to deliver large sets of data. It allows you to query that data, combine it with other data, and summarize that data. We built it because we wanted to serve millions of mortgage application records, and there was nothing out there that could do the same thing on the scale we were looking for. There are some smaller things—Socrata, CKAN’s data tables—and some really large enterprise-y things like Apache Drill, but nothing really in the middle, for serving 10–100 million rows of data easily.
It’s important to note that Qu isn’t just “the CFPB data platform”; it’s a platform for building your own data APIs.
Right; other people can use it for their own data sets that have nothing to do with us.
The work we’re doing right now is to make that as easy as possible.
What’s the difference between Qu and tools like Socrata and CKAN? Is it an alternative to them, or a complement?
Yes and yes? I think it makes a nice complement with CKAN, as CKAN is more focused on being a data catalog, whereas Qu is a data provider. That is, CKAN is great for showing the world your data sets, including sets in non-machine-readable formats, like PDF or Word documents. Qu is good for taking the machine-readable data and putting a simple API on top of it.
The features found on our HMDA tool—those are applications built using the API, not Qu itself, right?
Correct. Those are JavaScript applications based on our mortgage application API, which itself was built using Qu. We’ve built a template that lets you use Qu to build APIs for your own datasets. This is the first step in turning Qu into something like Django or Ruby on Rails—a library you use in your own app, instead of an app by itself.
Socrata and CKAN are applications. You download them and install them. They are like WordPress in this way: a web application you put on your server. You configure the application, but in the end, you have that application.
Qu was like this until recently. The big change we are making is that Qu is becoming a toolkit to build your API with. It doesn’t take much, and you might only have one simple file. For example, here’s the file that runs api.consumerfinance.gov. This was generated by the Leinengen template (linked above). But, you can add whatever you want.
This is how Qu has become more like Django or Rails. It makes it infinitely extensible without mucking around in the source code of Qu itself. Right now, we’ve just begun exploring what that can give us.
Elementary question: what’s the benefit to building your own API instead of just using the one that comes out of the box with one of those other products?
To be honest, right now, not a lot, besides that you can benefit from upgrades in Qu’s core software easily. But the end goal will make it matter a lot, because you will be able to pick and choose Qu components—the database, the formats—easily. Qu has always tried to follow the principle that APIs should be discoverable by a human. So, the API has an HTML interface that should let you use the whole thing.
Going back a few minutes to what you were saying about making Qu infinitely extensible, you said you’re just beginning to explore the benefits of this, but you must have had some reason for doing it to begin with.
This fell out of me wanting to make the database interchangeable. Doing that led me to think about the best way to make it switchable through configuration. And I ended up with an application template/builder rather than an application.
So what this means is, if I have a data set that I want to provide an API for, but don’t know how to build APIs, a future version of Qu will let me build a powerful one with relative ease, without forcing me to run it from a database I don’t have.
Right. Exactly.
Cool. Why did you choose Clojure?
Two reasons. First, for dealing with this much data, we need to use all the capabilities of our machines. There are not a lot of languages out there that make using multiple threads easy, and Clojure’s one of the few. (By the way, here’s a curriculum I wrote that explains why Clojure is good at this.)
#2: Clojure is fundamentally about data. It’s not an object-oriented language. Everything in Clojure is a data structure, which fits well when you’re writing programs to transform data.
#3 (I said two, but not true): Clojure is nothing more than a library for the Java Virtual Machine. This lets us use next-level technology while still being able to use all the Java libraries that exist today. In addition, most government and corporate environments know how to deploy Java applications. It’s a nice mix of looking forward without overwhelming our existing infrastructure.
And #4: I like using Clojure. Qu started as a prototype, so I used what I know and love. The prototype grew—like they do—and became the real application.
What have been some of the bigger engineering challenges?
Figuring out how to deliver an arbitrary amount of data was a big deal. If you’re working with our mortgage application data set, you can request any amount of data for download and we will serve it. This is hard, because we have a finite amount of memory and a very large amount of data. You can ask for 4 GB of data and we will serve it, yet we never keep that much data in memory.
Clojure made this fun and easy: it has “lazy sequences”, which not only allow us not to process things until we need them, but also allows us to garbage-collect data after we’ve used it.
“Garbage-collect data”?
We release the memory the data was using. The only data in memory is the data currently being delivered. Once you’ve got the data, we throw it away. This allows us to service multiple requests for large data sets without exploding. Imagine a window that you can look at a bunch of data through. That window moves over the data, showing only what’s necessary at any given time.
We serve that data using HTTP streaming, so you don’t have to wait for it all to be ready before you start receiving it.
So if you want to download a big file, we don’t have to tell you, “Okay, sit tight while we generate the data set for you—then you can come back and download your huge file.” Instead, it starts immediately.
Yes. Although, we have to do that right now for queries that are hard to calculate. We’re working on that.
What are some of the things at the top of your to-do list for Qu?
Our roadmap is public. I want to make Qu even easier to customize. Individual organizations should be able to take Qu and add new data backends, new endpoints, and new data formats very easily.
I want to overhaul the way you import data. Right now, it’s complex. You have to know a special format for the data definition. This should be easy to write, or even better, partially inferred.
And I want to continue to make the whole thing pluggable. For example, adding an admin dashboard adds a bunch of complexity for something you might not need. But, having a plugin for an admin dashboard lets you have that or leave it out as you wish.
My biggest goal is to get others using Qu so we can see what they need and they can contribute back. I’d love to see a CKAN/Qu integration. But I’d love to see someone else write it.
If someone else wanted to make some code contributions, where should they focus?
Definitely on data loading. It’s the first part of the app I wrote. It’s both pretty easy to understand and crufty: Here’s a sample data set ready for loading, and here is the definition file. It is huge and gross. I would love proposals on a better format for describing data coming in.
Soon, once things are a little more settled on this modularization, I’d love to see people write database adapters for DBs other than Mongo. Here are the docs on that.
–
Matthew Burton is the former Acting CIO of the Consumer Financial Protection Bureau. Though he has moved back to Brooklyn, he still works with the Bureau’s technology team on a part-time basis.
Clinton Dreisbach is a Clojure and Python hacker for the Consumer Financial Protection Bureau. He is the lead developer on Qu, the CFPB’s public data platform, and a contributor to Clojure, Hy, and other open source projects.
]]>In this article, we’ll touch on a few of the tools we use when parsing regulations.
The Government Printing Office publishes regulations in the Code of Federal Regulations (CFR) as XML bulk downloads. Surely, with a structured language such as XML defining a regulation, we don’t have much to do, right? Unfortunately, as Cornell discovered, not all XML documents are created equal, and the CFR’s data isn’t exactly clean. The Cornell analysis cites major issues with both inconsistent markup and, more insidiously, structure-without-meaning. Referring to the documents as a “bag of tags” conveys the problem well; just because a document has formatting does not mean it follows a logical structure. The XML provided in these bulk downloads was designed for conveying format, rather than structure, meaning header tags might be used to center text and italic paragraphs might imply headings.
In our efforts towards a minimum-viable-product, we chose to skip both the potential hints and pitfalls of XML parsing in favor of plain-text versions of the regulations. Our current development relies more heavily on XML, yet we continue to use plain text in many of our features, as it’s easier to reason about. For the sake of simplicity, this writeup will proceed with the assumption that the regulation is provided as a plain-text document.
Regular expressions are one of the building blocks of almost any text parser. While we won’t discuss them in great detail (there are many, better resources available), I will note that learning how to write simple regexes doesn’t take much time at all. As you progress and want to match more and more, Google around: due to their widespread use, it’s basically guaranteed that someone’s had the same problem.
Regular expressions allow you to describe the “shape” of text you would like to match. For example, if a sentence has the phrase “the term”, followed by some text, followed by “means” we might assume that that sentence is defining a word or phrase. Regexes give us many tools to narrow down the shape of acceptable text, including special characters to indicate whitespace, the beginning and end of a line, and “word boundaries” like commas, spaces, etc.
"the term .* means" # likely indicates a defined term
"\ba\b" # only matches the word "a"; doesn't match "a" inside another word such as "bad"
Regexes also let us retrieve matching text. In our example above, we could determine not only that a defined term was likely present but also what that term or phrase would be. Expressions may include multiple segments of retrieved text (known as “capture groups”), and advanced tools will provide deeper inspection such as segmenting out repeated expressions.
"Appendix ([A-Z]\d*) to Part (\d+)"
# Allows us to retrieve 'A6' and '2345' from "Appendix A6 to Part 2345"
Regular expressions serve as both a low-ish level tool for parsing and as a building block on which almost all parsing libraries are constructed. Understanding them will help you debug problems with higher-level tools as well as know their fundamental limitations.
Regulations generally follow a relatively strict hierarchy, where sections are broken into many levels of paragraphs and sub-paragraphs. The levels begin with the lower-case alphabet, then arabic numerals, followed by roman numerals, the upper-case alphabet, and then italic versions of many of these. Paragraphs each have a “marker”, indicating where the paragraph begins and giving it a reference, but these markers may not always be at the beginning of a line. This means that, to find paragraphs, we’ll need to search for markers throughout every line of text.
It’s not a simple matter of starting a new paragraph whenever a marker is found, however. Paragraph markers are also sprinkled throughout the regulation inside citations to other paragraphs (e.g. See paragraph (b)(4)). To solve this issue, we can run a citation parser (touched on shortly) to find the citations within a text and ignore paragraph markers found within them.
There’s also a pesky matter of ambiguity. Many of the roman numerals are identical (in appearance) to members of the lower-case alphabet. Further, when using plain text as a source, all italics are lost, so the deepest layers of the paragraph tree are indistinguishable from their parents. Luckily, we can both keep track of what we have seen before (i.e. what could the next marker be) as well as peek forward to see which marker follows. If a (i)-marker is followed by a (ii) or a (j), we can deduce exactly to which level in the tree the (i) corresponds.
Regular expressions certainly require additional mental overhead by future developers, who will generally “run” expressions in their mind to see what they do. Well-named expressions help a bit, but the syntax for naming capture groups in generally quite ugly. Further, combining expressions is error-prone and leads to even more indecipherable code. So-called “parser combinators” (i.e. parsers that can be combined) resolve or at least alleviate both of these issues. Combinators allow expressions to be named and easily combined to build larger expressions. Below, examples demonstrate these features using pyparsing, a parser combinator library for Python.
part = Word(digits).setResultsName("part")
section = Word(digits).setResultsName("section")
part_section = part + "." + section
parsed = part_section.parseString("1234.56")
assert(parsed.part == "1234")
assert(parsed.section == "56")
Parser combinators allow us to match relatively sophisticated citations, such as phrases which include multiple references separated by conjunction text. The parameter listAllMatches tells pyparsing to “collect” all the phrases which match our request. In this case, that means we can handle each citation by walking through the list.
citations = (
citation.copy().setResultsName("head")
+ ZeroOrMore(conj_phrase
+ citation.copy().setResultsName("tail",
listAllMatches=True)))
cits = citations.parseString("See paragraphs (a)(2), (3), and (b)")
for cit in [citations.head] + citations.tail:
handleCitation(cit)
Thus far, we have matched text, searched for markers, and retrieved sophisticated values out of the regulation. I can understand why this might feel like a bit of a letdown — the parser isn’t doing any magic. It doesn’t know what sentences mean; it simply knows how to find and retrieve specific kinds of substrings. While we could argue that this is a foundation of understanding, let’s do something fun instead.
The problem we face is that we must determine what has changed when a regulation is modified. Modifications don’t result in new versions of the regulaton from the Government Printing Office (which only publishes entire regulations once a year). Instead, we must look at the “notice” that modifies the regulation (effectively a diff). Unfortunately, the pin-point accuracy that we need appears only in English phrases like:
4. Section 1005.32 is amended by revising paragraphs (b)(2)(ii) and (c)(3), adding paragraph (b)(3), revising paragraph (c)(4) and removing paragraph (c)(5) to read as follows
We can certainly parse out some of the citations, but we won’t understand what’s happening to the text with these citations alone. To aid our efforts, let’s focus on the parts of this sentence that we care about. Notably, we only really care about citations and verbs (“revising”, “adding”, “removing”). Citations will play both the roles of context and nouns (i.e. what’s being modified). We can reduce the sentence into a sequence of “tokens”, in this case becoming:
[Citation, Verb, Citation, Citation, Verb, Citation, Verb, Citation, Verb, Citation]
Each Citation token will know its (partial) citation (e.g. paragraph (b)(3) with no section), while each Verb will know what action is being performed as well as the active/passive voice (“revising” vs. “revised”).
We next convert all passive verbs into their corresponding active form by changing the order of the tokens. For example, “paragraph (b) is revised” gets converted into “revising paragraph (b)” in token form. Next, we can carry citation information from left to right. In this sentence, “Section 1005.32” carries context to each of the other paragraphs, filling in their partial citation information.
Finally, we can step through our list of tokens, keeping track of which modification mode we are in. We’d see “Section 1005.32” first, but since we start with no verb/mode set, we will ignore it. We then see “revising” and set our modification mode correspondingly. We can therefore mark each of the next two citations as “modified”. We then hit an “adding” verb, so we switch modes and mark the following citation as “added”. We continue this way, switching modes and marking citations until the whole sentence is parsed.
[Citation[No Verb], Verb == revise, Citation[Revise], Citation[Revise], Verb == add, Citation[Add], Verb == revise ...
With combinations of just these tools, we can parse a great deal of content out of plain text regulations, including their structure, citations, definitions, diffs, and much more. What we’ve created has a great many limitations, however. The rule-based approach requires our developers think up “laws” for the English language, an approach which has proven itself ineffective in larger projects. Natural language is, in many ways, chaos, where machine learning and statistical techniques shine. In that realm, there is an expectation of inaccuracy simply because the problem is so big.
Fortunately, our task was not so large. The rule-based tools described above are effective with our limited set of examples (a subset of our own regulations). While the probabilistic techniques have, on average, higher accuracy for the general use case, they would not be as accurate as our tailored rules for our use cases. Striking the balance between rules and anarchy is difficult, but in this particular project, I believe we have chosen well.
]]>At the CFPB, we value openness and transparency. Additionally, one of our core values is innovation. Our organization embraces new ideas and technology. We are focused on continuously improving, learning, and pushing ourselves to be great. A natural result is that we are strong proponents of open source software, both using it in our organization and releasing software we build. You can see more details about our philosophy on using and releasing open source software by reading our Source Code Policy.
Along with software development, this website will also feature the CFPB’s design work, which is led by a great team of graphic and user experience designers. We’ll talk about our design process and the value of design in helping consumers to understand the risks and benefits of their financial choices. Too often, financial products, services, contracts, and terms are unfamiliar and confusing. Good design can help consumers make the right choices for themselves and their families.
The CFPB has its own internal design and development teams. The mission of the Bureau is to ensure that markets for consumer financial products and services are fair, transparent, and competitive, and we rely on intense collaboration between the technology team and our expert policy staff to accomplish that mission. Financial products are increasingly technology-based, and having an in-house team of designers and developers at the CFPB enables us to keep pace with today’s market and develop new tools that help consumers succeed in their financial lives.
Our hope is that by releasing as much of our code as possible here on GitHub, we can learn from and share with other agencies and individuals across the country (or even the world), in the spirit of open source software development and open government. We’ve already accepted a pull request from a member of the public.
From time to time, we’ll also discuss this work on consumerfinance.gov, but here you’ll find more detail about choices, processes, and techniques.
At the Consumer Financial Protection Bureau, we are committed to building world-class technology tools for the public and our colleagues. We look forward to sharing our work with you.
]]>