In R, this is pretty easy to do and it involves the use of the tool Air created by the Posit folks. Described as “an extremely fast R formatter” it will format your code every time you hit save. Best used in conjunction with the tidyverse style guide
As of the time of writing: auto formatting is ONLY applied to .r, .R files. R chunks inside R-Markdown files (‘rmd`) are not formatted.
Note: Air is a layout formatter, “ensuring that whitespace, newlines, and other punctuation conform to a set of rules and standards”
Using this tool you can reformat ALL existing code in a repo. From the terminal simply type the command air format .. Or if you’d prefer you can open every R file and hit save. I know which one i’ll choose!
The easiest way to keep the repo formatted consistently is to include an air.toml file in the root directory of your repo. The file can remain empty, or you can add specific formatting you’d like in your repo. If left empty air will apply defaults settings. This is often a good option.
The benefit of using this air.toml file is that anyone who contributes to your repo will automatically have their contributed code formatted as dictated by the toml file, rather than any specific user level settings. A no worry solution for consistent formatting of code. The only caveat is that contributors will need to have air installed too, but having the air.toml in the repo and indicating that air formatting is required by using a pull request template, for example, this should be painless.
If you want to take things to the next level, you can create a GitHub action to run on all pull requests. The action will check for formatting inconsistencies and fail if formatting is required. If this is of interest see the air documentation
Now all you need to worry about is coding styles, a subject we’ll talk about at a later date.
]]>Issue and Pull Request (PR) templates act as a structured “handshake” between a project maintainer and a contributor. Without them, communication often becomes a messy back-and-forth of “Which version are you using?” or “What does this code actually do?” etc.
Here are a few reasons why they are great for collaborative teams:
Templates act as a checklist that forces contributors to think through their submission.
For Issues: Instead of a report saying “It’s broken,” a template requires a Minimal Reproducible Example (Reprex), environment details (like your R version or OS), and clear steps to recreate the bug.
For Pull Requests: It ensures the author explains the “Why” behind a change, not just the “What.”
Reviewing code is mentally taxing. Templates reduce this “cognitive load” by:
Standardizing Layout: When every PR looks the same, reviewers know exactly where to find the testing instructions, the link to the Jira/GitHub issue, and the “breaking changes” warning.
For someone new to your project, the “New Issue” button can be intimidating.
Templates provide scaffolding. They tell the user exactly what information is valued, making them feel more confident that their contribution will be accepted.
It serves as a subtle way to enforce Contribution Guidelines without making someone read a 2,000-word CONTRIBUTING.md file first.
Six months from now, when you’re wondering why a specific line of code was changed, a well-filled PR template provides the context that a single-line commit message often misses. It captures the intent and the testing process used at the time.
In GitHub, templates can be written in either markdown or yaml files and are saved in standard locations
You can list as many issue templates as you like, for example for issues you could include templates for
Bug ReportingFeature RequestsData IssuesThere is a caveat worth mentioning with pull requests templates. If you’d like a default template to appear EVERY time you make a pull request then you must have a default template called pull_request_template.md residing in the .github folder.
If you want to include more than this default template you will need to include additional templates to the PULL_REQUEST_TEMPLATE folder. You will NOT be prompted by GitHub as to which one you want to use. If you have a default that will be used. And, as of date of publication, GitHub does not provide a drop down to select a pull request template. To access the templates in the PULL_REQUEST_TEMPLATE folder you need to add a snippet of additional text to the URL - &template=your_template_name.md.
To see some examples from across the web
And if you want to go one step further … you can create repository templates. These can be set up to contain all of the above templates, a CONTRIBUTING file, CODE_OF_CONDUCT, LICENSE, auto formatting etc. and whenever a repo is created all of these files are bundled with the creation! Pretty cool!
In summary, templates are easy to create, human readable, easy to implement, and have been proven to reduce hair loss!
Enjoy!
]]>The data are served up in a consistent way, allowing users to visualize and download data in various standard data formats, like netCDF (nc), csv, txt, JSON.
While it is perfectly reasonable to interact with these ERDDAP™ servers through a web interface, and often preferable initially to identify data sources, read metadata, and visualize the data, pulling and working with the data is often best accomplished using a language like python or R. Since i mostly use R for my work, we’ll go that route.
For R users there is a wonderful package called rerddap designed to help search, connect, and download data from ERDDAP™ servers. Lets go through an example to demonstrate how to get started. The steps involved are
rerddap::servers())rerddap::ed_datasets())rerddap::info())rerddap::tabledap() or rerddap::griddap())First, let’s see the list of server names available using the servers() function
rerddap::servers()
#> # A tibble: 63 × 4
#> name short_name url public
#> <chr> <chr> <chr> <lgl>
#> 1 Voice of the Ocean VOTO http… TRUE
#> 2 St. Lawrence Global Observatory - CIOOS | Observatoi… SLGO-OGSL http… TRUE
#> 3 CoastWatch West Coast Node CSWC http… TRUE
#> 4 ERDDAP at the Asia-Pacific Data-Research Center APDRC http… TRUE
#> 5 NOAA's National Centers for Environmental Informatio… NCEI http… TRUE
#> 6 Biological and Chemical Oceanography Data Management… BCODMO http… TRUE
#> 7 European Marine Observation and Data Network (EMODne… EMODnet http… TRUE
#> 8 European Marine Observation and Data Network (EMODne… EMODnet P… http… TRUE
#> 9 Marine Institute - Ireland MII http… TRUE
#> 10 CoastWatch Caribbean/Gulf of Mexico Node CSCGOM http… TRUE
#> # ℹ 53 more rows
Created on 2025-12-08 with reprex v2.1.1
The “CoastWatch West Coast Node” server looks interesting, lets explore that. We’ll need to grab the url field to explore the datasets hosted on this server.
servers <- rerddap::servers()
servers |>
dplyr::filter(short_name == "CSWC") |>
dplyr::select(url)
#> # A tibble: 1 × 1
#> url
#> <chr>
#> 1 https://coastwatch.pfeg.noaa.gov/erddap/
Created on 2025-12-08 with reprex v2.1.1
Now, at this point you can either copy this url into your browser and explore the contents of the server from there, or you can use rerddap to list the data sets. At this point we should mention that there are often two types of data hosted on these servers
griddap, in NetCDF (nc) formattabledap, often in csv formatLet’s search for all of the tabular data on this server.
servers <- rerddap::servers()
url <- servers |>
dplyr::filter(short_name == "CSWC") |>
dplyr::select(url)
rerddap::ed_datasets(which = "tabledap", url = url)
#> # A tibble: 293 × 17
#> griddap Subset tabledap Make.A.Graph wms files Accessible Title Summary
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 "" https://c… https:/… https://coa… "" "" public * Th… "This …
#> 2 "" https://c… https:/… https://coa… "" "htt… public Audi… "Audio…
#> 3 "" https://c… https:/… https://coa… "" "htt… public CalC… "Hydro…
#> 4 "" https://c… https:/… https://coa… "" "htt… public CalC… "Sampl…
#> 5 "" https://c… https:/… https://coa… "" "htt… public CalC… "Cruis…
#> 6 "" https://c… https:/… https://coa… "" "htt… public CalC… "Fish …
#> 7 "" https://c… https:/… https://coa… "" "htt… public CalC… "Egg m…
#> 8 "" https://c… https:/… https://coa… "" "htt… public CalC… "Fish …
#> 9 "" https://c… https:/… https://coa… "" "htt… public CalC… "Size …
#> 10 "" https://c… https:/… https://coa… "" "htt… public CalC… "Devel…
#> # ℹ 283 more rows
#> # ℹ 8 more variables: FGDC <chr>, ISO.19115 <chr>, Info <chr>,
#> # Background.Info <chr>, RSS <chr>, Email <chr>, Institution <chr>,
#> # Dataset.ID <chr>
Created on 2025-12-08 with reprex v2.1.1
We can see that there are 293 datasets available, each with its own url field (tabldap) and ID (Dataset.ID) and other accompanying metadata. As mentioned earlier it may be easier to look at these from within a web browser. But for the sake of this example let’s just look at the Summary metadata
servers <- rerddap::servers()
url <- servers |>
dplyr::filter(short_name == "CSWC") |>
dplyr::select(url)
datasets <- rerddap::ed_datasets(which = "tabledap", url = url)
datasets |>
dplyr::select(Dataset.ID,Summary)
#> # A tibble: 293 × 2
#> Dataset.ID Summary
#> <chr> <chr>
#> 1 allDatasets "This dataset is a table which has a row of information…
#> 2 testTableWav "Audio data from a local source.\n\ncdm_data_type = Oth…
#> 3 erdCalCOFINOAAhydros "Hydrographic data collected by CTD as part of CalCOFI …
#> 4 erdCalCOFIcufes "Samples collected using the Continuous Underway Fish-E…
#> 5 erdCalCOFIcruises "Cruises using one or more ships conducted as part of t…
#> 6 erdCalCOFIeggcnt "Fish egg counts and standardized counts for eggs captu…
#> 7 erdCalCOFIeggstg "Egg morphological developmental stage for eggs of sele…
#> 8 erdCalCOFIlrvcnt "Fish larvae counts and standardized counts for eggs ca…
#> 9 erdCalCOFIlrvsiz "Size data for selected larval fish captured in CalCOFI…
#> 10 erdCalCOFIlrvstg "Developmental stages (yolk sac, preflexion, flexion, p…
#> # ℹ 283 more rows
Created on 2025-12-08 with reprex v2.1.1
Now, from experience i know there is a dataset hosted here containing data from the National Data Buoy Center (NDBC). The data has a Dataset.ID = “cwwcNDBCMet”.
datasets |>
dplyr::filter(grepl("NDBC",Summary)) |>
dplyr::pull(Dataset.ID)
[1] "cwwcNDBCMet"
Once we have this id, we are ready to explore and pull the data. So lets grab all of the information relating to this dataset. We’ll use the function info()
data_info <- rerddap::info(datasetid = "cwwcNDBCMet", url = "https://coastwatch.pfeg.noaa.gov/erddap/")
data_info is a list object containing metadata, like the url of the server, variables describing the variable names, variable type, and variable data range. This list object is what you’ll pass to the function, tabledap() to pull the data.
Even though we have identified the data set we want, we don’t really know how much data there is! If you tried to pull all of the data at once, there is a good chance your computer will crash! So in prep, we can explore the set of available variables using a couple of methods that involve the data_info list object:
rerddap::browse(data_info) - will open a webpage on ERDDAP™ describing the data set
data_info$variables will list the variables available and data_info$alldata[[variableName]] will show more detailed information about each variable.
erddap_url <- 'https://coastwatch.pfeg.noaa.gov/erddap/'
datasetid <- "cwwcNDBCMet"
data_info <- suppressMessages(rerddap::info(datasetid, url = erddap_url))
data_info$variables
#> variable_name data_type actual_range
#> 1 apd float 0.0, 95.0
#> 2 atmp float -153.4, 50.0
#> 3 bar float 800.7, 1198.8
#> 4 dewp float -99.9, 48.7
#> 5 dpd float 0.0, 64.0
#> 6 gst float 0.0, 75.5
#> 7 latitude float -55.0, 71.758
#> 8 longitude float -177.75, 179.001
#> 9 mwd short 0, 359
#> 10 ptdy float -13.1, 14.9
#> 11 station String
#> 12 tide float -9.37, 6.15
#> 13 time double 4910400.0, 1.7697828E9
#> 14 vis float 0.0, 66.7
#> 15 wd short 0, 359
#> 16 wspd float 0.0, 96.0
#> 17 wspu float -98.7, 97.5
#> 18 wspv float -98.7, 97.5
#> 19 wtmp float -98.7, 50.0
#> 20 wvht float 0.0, 92.39
Created on 2026-01-30 with reprex v2.1.1
What jumps out here is the station variable and the latitude and longitude variables. We’ll now use these to pull the list of stations available
erddap_url <- 'https://coastwatch.pfeg.noaa.gov/erddap/'
datasetid <- "cwwcNDBCMet"
data_info <- suppressMessages(rerddap::info(datasetid, url = erddap_url))
rerddap::tabledap(
data_info,
fields = c("station", "longitude", "latitude"),
distinct = TRUE
)
#> info() output passed to x; setting base url to: https://coastwatch.pfeg.noaa.gov/erddap
#> <ERDDAP tabledap> cwwcNDBCMet
#> # A tibble: 1,329 × 3
#> station longitude latitude
#> <chr> <dbl> <dbl>
#> 1 0Y2W3 -87.3 44.8
#> 2 18CI3 -86.9 41.7
#> 3 20CM4 -86.5 42.1
#> 4 23020 38.5 22.2
#> 5 31201 -48.1 -27.7
#> 6 32012 -85.4 -19.6
#> 7 32301 -105. -9.9
#> 8 32302 -85.1 -18
#> 9 32487 -77.7 3.52
#> 10 32488 -77.5 6.26
#> # ℹ 1,319 more rows
Created on 2026-01-30 with reprex v2.1.1
Now that you have identified the set of buoy stations available you can now either pull individual stations data or pull a collection of stations within a geographic region. Either of these tasks will require using the erddap::tabledap() function. For example:
Select the station(s) of interest, then get the data. In this example we’ll pull all of the data associated with buoy 32012
erddap_url <- 'https://coastwatch.pfeg.noaa.gov/erddap/'
datasetid <- "cwwcNDBCMet"
data_info <- suppressMessages(rerddap::info(datasetid, url = erddap_url))
variables <- data_info$variables$variable_name
rerddap::tabledap(
datasetid,
fields = variables,
query = paste0('station="', 32012, '"')
)
#> <ERDDAP tabledap> cwwcNDBCMet
#> File size: [9.37 mb]
#> # A tibble: 84,918 × 20
#> apd atmp bar dewp dpd gst latitude longitude mwd ptdy station
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
#> 1 6.88 NaN NaN NaN 13.8 NaN -19.6 -85.4 215 NaN 32012
#> 2 7.01 NaN NaN NaN 13.8 NaN -19.6 -85.4 253 NaN 32012
#> 3 6.8 NaN NaN NaN 11.4 NaN -19.6 -85.4 202 NaN 32012
#> 4 7.31 NaN NaN NaN 11.4 NaN -19.6 -85.4 200 NaN 32012
#> 5 7.32 NaN NaN NaN 10.8 NaN -19.6 -85.4 190 NaN 32012
#> 6 7.09 NaN NaN NaN 11.4 NaN -19.6 -85.4 204 NaN 32012
#> 7 7.68 NaN NaN NaN 10.8 NaN -19.6 -85.4 207 NaN 32012
#> 8 7.07 NaN NaN NaN 12.9 NaN -19.6 -85.4 235 NaN 32012
#> 9 7.09 NaN NaN NaN 11.4 NaN -19.6 -85.4 219 NaN 32012
#> 10 6.94 NaN NaN NaN 10 NaN -19.6 -85.4 201 NaN 32012
#> # ℹ 84,908 more rows
#> # ℹ 9 more variables: tide <dbl>, time <dttm>, vis <dbl>, wd <int>, wspd <dbl>,
#> # wspu <dbl>, wspv <dbl>, wtmp <dbl>, wvht <dbl>
Created on 2026-01-30 with reprex v2.1.1
Now you can work with this data in R for whatever purpose you’d like.
If you want to simply pull all of the buoys in a specific region you can do this too. ERDDAP™ has a few server side functions that let you narrow your search. For example, lets pull all stations within a region along the Northeast USA seaboard, (around Cape cod, MA) between latitudes [41.6,41.8] and longitudes [-70.5, -69.5]
erddap_url <- 'https://coastwatch.pfeg.noaa.gov/erddap/'
datasetid <- "cwwcNDBCMet"
data_info <- suppressMessages(rerddap::info(datasetid, url = erddap_url))
variables <- data_info$variables$variable_name
rerddap::tabledap(
datasetid,
fields = variables,
'latitude>=41.6','latitude<=41.8','longitude<=-69.5','longitude>=-70.5'
)
#> <ERDDAP tabledap> cwwcNDBCMet
#> Path: [C:\Users\ANDREW~1.BEE\AppData\Local\Temp\RtmpSQAjbB\R\rerddap\c9f6726fc0a4f89d39c0c92913fabd90.csv]
#> Last updated: [2026-01-30 11:00:00.736641]
#> File size: [54.57 mb]
#> # A tibble: 501,476 × 20
#> apd atmp bar dewp dpd gst latitude longitude mwd ptdy station
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <chr>
#> 1 NaN 4.9 998. NaN NaN 9.8 41.7 -70.0 NA NaN CHTM3
#> 2 NaN 4.9 998. NaN NaN 11.9 41.7 -70.0 NA NaN CHTM3
#> 3 NaN 4.9 998 NaN NaN 11.3 41.7 -70.0 NA NaN CHTM3
#> 4 NaN 4.9 998 NaN NaN 10.8 41.7 -70.0 NA NaN CHTM3
#> 5 NaN 5 998. NaN NaN 10 41.7 -70.0 NA NaN CHTM3
#> 6 NaN 5 998. NaN NaN 11.4 41.7 -70.0 NA NaN CHTM3
#> 7 NaN 5.1 998. NaN NaN 9.9 41.7 -70.0 NA NaN CHTM3
#> 8 NaN 5 998. NaN NaN 11.6 41.7 -70.0 NA NaN CHTM3
#> 9 NaN 5 998. NaN NaN 8.9 41.7 -70.0 NA NaN CHTM3
#> 10 NaN 4.9 998. NaN NaN 10.6 41.7 -70.0 NA NaN CHTM3
#> # ℹ 501,466 more rows
#> # ℹ 9 more variables: tide <dbl>, time <dttm>, vis <dbl>, wd <int>, wspd <dbl>,
#> # wspu <dbl>, wspv <dbl>, wtmp <dbl>, wvht <dbl>
Created on 2026-01-30 with reprex v2.1.1
From this narrow geographic region there is only one station CHMT3 which is a buoy off Chatham, MA
Getting the data from ERDDAP™ into R requires a little work. You can use a combination of the R package rerddap and the ERDDAP™ website to help identify the data you are interested in.
Fortunately for the buoy station data used in the example, there is an R package called buoydata available to help with the identification of buoy station availability with tools to pull station data hassle free.
Enjoy exploring the masses of data on ERDDAP™
]]>An Open Researcher and Contributor ID (ORCID) is a 16 digit, persistent, name independent unique identifier. From ORCID site “ founded specifically to help solve the problem of name ambiguity in research and to enable transparent and trustworthy connections between researchers, their contributions, and their affiliations.”
So even if you change names or affiliations your ORCID remains the same and “follows” you through your career. Read more about ORCID from the official site.
And … It is free to obtain
Many publishers and institutions are using ORCID to seamlessly share information between data systems, so the next article you submit for publication, expect to see a field asking for it! Outside of publishers, you can use it in your R package development and link your software development skills to your scientific publishing.
Simply, add to the DESCRIPTION file of you R package and, if you use pkgdown, which you should :index_pointing_at_the_viewer:, it will propagate to the online documentation. For example:
Simply by by adding this snippet of code, (of course, you’ll have your own 16 digit code when you sign up!)
Authors@R:
person(given = "Andy",family = "Beet", role = c("aut", "cre"),
email = "[email protected]",
comment = c(ORCID = "0000-0001-8270-7090"))
… results in pkgdown displaying hyperlinked icons,
, in two locations, the home page and the citation page, displaying something like this:

For more info on the benefits of using pkgdown to showcase your R package, you might find the post on Enhancing R packages useful.
So go on, get yourself an ORCID iD!!
]]>You can see ALL of the components mentioned in this post implemented in the R package stocksmart
You’ve seen these. Many site have these cool looking hexes, the Posit team has one for every R package in the tidyverse! Good news is that they are easy to create and implement.
The package hexSticker is a good start to help create a hex! Simply save your design as logo.png, add it to your man/figures folder, and link to it from your README.md with an image tag defined after the name of your package. Something like this:
# package_name <img src="proxy.php?url=man/figures/logo.png" align="right" width="120" />
If you’d like others to use your package, then having a nice looking website with documentation is essential. A step made very easy using a combination of the pkgdown and the usethis packages.
pkgdown will create the website locally from a single function call, build_site(). The default layout is pretty good right out of the box, but you can customize a little if you’d like.usethis will create a GitHub action (with the functions use_github_action() or use_pkgdown_github_pages()) to redeploy your website everytime you make changes to the code or documentationNot only will a website make your package more user friendly, it will highlight your documentation and show you where you need to focus more attention. A previous post on GitHub actions for R packages explain this in more detail.
Adding a helpful user guide or several articles to introduce a user to your package can enhance the adoption of your package by others. They are also pretty easy to create! If you know Rmarkdown or quarto then it’s a breeze. Just save your rmd or qmd in the vignettes folder of your package and pkgdown will take care of integrating them to your site.
Now i’m sure your package is awesome!! But some users may want some additional features that you didn’t anticipate, or maybe they found a bug in your code! I know, very unlikely right? :rofl: The standard issue templates provided by GitHub work just fine as a reporting tool, but to avoid a lot of back and forth conversation with a user of you package you can customize templates to ask for exactly the information you need from a user to either troubleshoot a bug or add a feature.
These custom templates are written in YAML, formatted ASCII text. They are saved in the folder .github/ISSUE_TEMPLATE
Adding a guide with instructions on how users can contribute to your package is worthwhile if you want to avoid potential future headaches! Although this does depend on whether people actually read the guidelines, which often they don’t! This then begs the question, why bother? Well the answer is, you can point people to the document when needed without having to waste time explaining the process each time. The same applies to a code of conduct document, outlining what you expect from contributors regarding behaviour and the type of environment you want to foster.
Best practices often suggest to add these two documents as markdown, .md, files in the root of your repository with names CODE_OF_CONDUCT.md and CONTRIBUTING.md. A great template, to get you started, can be found at @jessesquires.
Oh, once again pkgdown will take care of integrating them into your site automatically!
One of the most painful debugging experiences you might encounter is when someone informs you that your package wont install on their OS! What do you do? You work on a Windows machine and a MacOS user just informed you they are having issues installing your package! Well the answer is to use GitHub actions. Like the pkgdown example above, you can create, very easily, a workflow that will, for every pushed commit or pull request, run a series of checks (analogous to those required when submitting to CRAN) to inform you, amongst other things, if your package can be installed without issues on a variety of operating systems!
Again, the usethis package has a function, use_github_action("check-standard"), that will create the appropriate workflow YAML for you. This workflow is analogous to the devtools::check() function you can run locally which checks your package using “all known best practices”
Probably one of the least utilized practices in package deployment! Not because unit tests are hard to implement but because it can be hard to think about what kinds of test you need/should implement.
Well, the Posit folks help with the implementation part of unit testing with the testthat package. This package has tools to aid in file structure set up, and includes many tools to aid in different types of tests. And if you’ve implemented the R-CMD continuous integration workflow (described above) then all of the tests you create will be run when you push a commit or make a pull request!
These tests are pretty important, if thought about carefully, since they should catch many bugs BEFORE you release your package. And if you add new features or change the some of the functionality, these tests should aid in determining if the expected outputs are reasonable and that you haven’t introduced unexpected bugs.
Versioning your package should be taken very seriously. Reproducibility is so important in todays age of rapidly changing software. If you don’t version your software, your future self and others will find it extremely difficult to reproduce old code. In addition it should be used to communicate changes, like bug fixes or new features to users and other developers.
For details regarding when and how to version your package, see versioning R packages.
You can see ALL of the components mentioned in this post implemented in the R package stocksmart
All great questions! :rofl: Lets talk this through …
So in principle, every time content is pushed to the main branch of your R package repository should trigger a versioning/release event. This statement assumes that you are using best practices and have adopted a branching strategy. This earlier post on selecting a branching strategy is worth a read if you are unsure of what this means.
Under the branching strategy i like to employ, called feature branching, all new features and bug fixes reside on their own branch until completion. At that point they are pulled into the development branch, often named dev, via a pull request. At this point multiple workflows are run to check various aspects of the code, typically R-CMD checks. Check out the post on enhancing your R packages for more info on this.
If all checks pass, then the development branch is ready to be pulled in to the main branch. If they fail, you’ll need to address the issues and fix them. Better you do it now vs a user submitting an issue at a later date.
When all checks pass you need to update the package version in the DESCRIPTION file and update content in the NEWS.md file to summarize all relevant changes to the package, whether a feature enhancement, a bug fix, or something else.
After you then, merge the pull request from dev -> main you immediately release the package using the release feature on GitHub. The description of the release should use the information in the NEWS.md file and the release version should match the version number in the DESCRIPTION file. The target commit associated with this release should be the latest commit to the main branch.
So this covers the how, when, and what? But what about the questions relating to how often should a package be released or what versioning scheme should be used?
The versioning scheme that is considered best practice is semantic versioning. An earlier post, Versioning R packages, goes into more detail on how this relates to R package development.
And with regard to how often you should release, well that depends on a lot of things, how frequently bugs are found and addressed or how quickly you want to add new features. Of course these do not need to be versioned as independent events. You can bundle new features and bug fixes into the same release of your package. You just need to adjust the version number to reflect these changes and document the changes in the NEWS.md file, often termed the changelog.
Under this workflow, at any point in time the main branch of the repository should represent a working released version of your package. If someone came across your repository and installed your package, it should be expected to work just fine! All developmental work, whether new features or bug fixes, should reside on their own branches and only merged into the main branch when fully tested and ready for release! This way the main branch remains “clean” of issues.
Hope this helps!
]]>In the development of R packages it is no different. Let’s dive a little deeper with some examples. Following semantic versioning we adopt the format of MAJOR.MINOR.PATCH. For example v3.5.1. The interpretation of these three components can be a little confusing, especially in the context of R package development.
Suppose a package is currently at version 3.5.1
I think you get it! Enough said.
]]>If so, why not why not create an R package to version and document the data changes. That way you’ll always be able to go back to an older version and you’ll be able to track what has changed through time.
You can do all of this using a GitHub action. Within the workflow, the steps you’ll need are (not limited to):
This seems like a lot, and it might take a while to get things set up and working correctly, but the benefits are huge!
As a real example, consider the R data package stocksmart. This is an R data package that serves up data from all federally managed fish stocks in the USA. Our group relies on this data to update annual ecosystem reports. Having the data automatically pulled, versioned, and documented saves us a lot of time.
The specifics of this workflow are:
use_github_release function.You will need to add secrets to your repo to allow for the email action and the GitHub release function to behave as expected. Once set up this workflow is a maintenance light way to always have up-to date data to work with in R, not only for you, but for the R community at large!
Happy coding!
]]>Some examples:
Ideally, you want to be able to handle these situations within your code, adapt to the error, and continue.
In the examples above, solutions might be:
Using the tryCatch() function, bundled in base R, is a great option.
This example attempts to accesses an oracle server to pull data. Custom warnings are returned based on the type of warning thrown
chan <- tryCatch(
{
driver <- ROracle::Oracle()
chan <- ROracle::dbConnect(driver, dbname=server,username=uid,password=pwd)
}, warning=function(w) {
if (grepl("login denied",w)) {message("login to server failed - Check username and password")}
if (grepl("locked",w)) {message("logon to server failed - Account may be locked")}
message(paste0("Can not Connect to Database: ",server))
return(NA)
}, error=function(e) {
message(paste0("Terminal error: ",e))
return(NA)
}
)
Attempt to download a file from an online location. If it’s missing, or can’t be downloaded for some reason, the code skips the file
# get file, catch error for missing file
result <- tryCatch(
{
downloader::download(fpath,destfile=destfile,quiet=TRUE,mode="wb")
res <- TRUE
}, error = function(e){
message(paste0("No data for ",afname))
return(FALSE)
},warning = function(w)
return(FALSE)
)
if (!result) {
next
}
In summary, error handling can save you a lot of frustration from code breaking prematurely!
]]>Well, you can add release tags to any “old” commit, providing you can identify the commits! Now this isn’t a recommended practice or a substitute for not following best practices, but in a pinch it can help out “the younger, inexperienced you of years gone by”.
Turns out it is pretty simple too!
First, identify the commit hash. Second, decide on the tagname you want to assign to this commit. Then, run the git tag command.
For example, if your hash is 1e4b567712d785bb972665a2edd9401a17d9875d and you want to tag this with the name, v1.3.1 then you’d run the following lines of code in the terminal
git tag v1.3.1 1e4b567712d785bb972665a2edd9401a17d9875d
git push origin v1.3.1
If you want to annotate the release, and store the taggers name, date, time, along with a message you can use the following:
git tag -a v1.3.1 1e4b567712d785bb972665a2edd9401a17d9875d -m "Version 1.3.1 Release"
git push origin v1.3.1
Repeat this as many times as you’d like.
In GitHub you should be able to now create a release using this tag!
Make sure you add good release notes in the description field so you know why you tagged this particular point in time as a release point!
]]>