This is a very messy citation exporter for Informit.org. For some strange reason, despite being run by an Australian university, informit does not support exporting AGLC-compliant citations.
- Start with a list of Informit DOIs - perhaps you were saving articles while doing your research
- Run
injest.jsto build the API call.injest.jsexpects agood.linksfile with a list of the DOIs you want to export. You can use other tools (awk/sed) to build the request. - Retrieve an API response from the URL generated by the previous step with, e.g.
weget -O results.json <url> - Run
index.js- this will parse the information inresults.jsonand output a 95% AGLC4-complaint list of references in markdown format. - Run
index.js bibto format the reference list as an AGLC bibliography by reversing the first and last names of the first-listed author and removing the full stop at the end of references. - Optional: convert Markdown to Word Doc with:
[pandoc](https://pandoc.org/) -s --ascii -o aglc.doc aglc.md
Caution: manual review is required. This script tries to comply with AGLC4, but I cannot guarantee perfect coverage. Sometimes there are AGLC4 edgecases which I have not considered (such as journal articles spanning multiple years) and sometimes there are deficiencies in the API response, such as missing delimiters in container-title or missing keys.
The old method relied on scraping the HTML from the Informit website to build citation information and save it in a JSON dictionary. A second script would then convert the dictionary into Markdown. The markdown could be converted to doc/rtf with pandoc.
The web scraping method produces inconsistent results. Some pages just do not render the tags I rely upon for citation information. Fortunately, Informit exposes an API which they themselves use to export citations. It supports batch jobs, too!
The URL takes the form of:
https://data.informit.org/action/exportCiteProcCitation?dois=DOI_LIST_HERE&targetFile=custom-refWorks&format=text
The DOI list is comma-separated and must include the Informit prefix 10.3316/.
The workflow is to:
- Get the DOIs of the files we want to cite with, say,
ls -1 /path/to/saved/articles > good.links(assuming files were downloaded in the first place and filenames weren't changed) - Build the API call - note that local filenames won't have the
10.3316prefix - Fire the call and retrieve the JSON - at the moment this needs to be done in
wgetasgotdoes not handle all the redirects and cookies that the API requires. - Process the JSON citations and write them to Markdown
- Bonus: write separate bibliography and footnote files
- Bonus:
process.execpandoc conversion
The API returns a JSON response. Two top-level keys are of relevance:
items, which is an array of citation objects; andexportedDoiLengthwhich is self-explanatory and may be redundant if you just useitems.length.
A citation object looks like:
{
"<doi>": {
// "<citation-key>:<citation:value>
// ...
}
}Citation fields vary between articles. These are the most relevant:
author, which is an array of author objects, each containing afamilyandgivenkey, like:
"author": [
{
"family": "COOPER",
"given": "RE"
}
]issued.date-parts[0]- the year the article was published. The date is presented as an array, so just look at the first elementcontainer-titlethis can be ambiguous: it contains either the name of the journal only (if separate keys exist forvolume,issueand/orpage), or journal name + volume and issue + page rangevolume,issueandtitle- are self explanatory.pageis the page-range for the article, where the start and end pages are separated by a dodgy unicode en dash or hyphen.
In the browser console:
let source = window.location.href; let parsed = source.split('/'); parsed.splice(4, 0, "pdf"); let dest = parsed.join('/'); dest += '?download=true'; location.replace(dest)Code licensed Apache 2. Documentation licensed CC-BY-SA.
This project uses node.js, got and lodash.
- Building the API call and retrieving the API response currently requires two separate steps/scripts. While the API does not require credentials, it nevertheless sets and checks for certain cookies and headers that
gotdoes not handle by default. I don't have the time to look into it for now, so just usewget. - I'm not using async code 100% effectively