-
-
Notifications
You must be signed in to change notification settings - Fork 54
Scraping Comics
This page describes how to scrape comics, either individual or in a batch.
Scraping a comic is the process by which details (metadata) for a comic is retrieved from a source and associating that data with a comic.
One of the most popular sites for comic metadata is ComicVine. Their database holds a LOT of data, and they expose that data via a web interface. It is just one of the sources for metadata that ComiXed leverages to allow administrators to update each comic in their library with the most up to date details for issues available.
ComiXed can retrieve metadata from different sources, such as ComicVine. To use such a site, the metadata source must be installed and configured.
When the metadata source is installed, it will show up on the Metadata Sources page. You will need to enter the required properties for the source (for example, ComicVine requires a valid API key).
Then, when scraping a comic, the source can be selected. When the scraping is performed, that comic will be associated with that source. Any future rescraping of metadata will go directly to that source until such a time as a different source is used.
A comic can only be associated with one source at a time for its metadata.
By default, ComiXed comes with support for the ComicVine metadata source. But others can be be installed using the following steps:
- Stop the ComiXed server if it's running.
- Copy the new metadata source JAR file into the library directory.
- Restart the ComiXed server.
- Log into the application as an admin user,
- go to the Configuration page,
- click on the Metadata Sources tab,
- expand the properties for a source, then
- click Save.
- Go to a comic's details page,
- open the Edit Details tab for that comic,
- select a metadata source, and then
- click on the Scrape This Comic button in the edit toolbar.
The minimal data needed to scrape a comic is the series name and the issue number. Other data, such as the volume, helps the automated selection system to find the best set of data for the comic.
After pressing the scraping button, ComiXed will go to the metadata source and download a set of volumes that most closely match the details for the comic being scraped. They are then presented in a table.
Clicking on a row will download the cover and present it to you for comparison to your comic. Once you've found the correct entry, clicking the Accept button will update your database for that comic with all of the known data from the metadata source.
You can cancel scraping by clicking on the Stop Scraping button in the toolbar at any point prior without affecting any data.
After the first time you've scraped a comic, or if you know the metadata source ID for a comic, you can scrape it directly without having to first search for the appropriate volume or issue.
If the metadata source ID is already present, you can skip this step. If it is not, and you know it, you can enter the value (it's a 6-digit number) into the Reference ID field on the comic editing tab.
With the ID present, the Scrape using the Reference ID button is enabled in the toolbar. After confirming the action, this will scrape the comic details without any further user interaction.
To scrape multiple comics, you need to:
- select one or more comics while on any list page, and then
- click on the Start scraping comics button on the list page toolbar.
This process only differs from scraping a single comic in that it shows you what comics have yet to be processed. Otherwise, the interaction is identical.
ComiXed has the ability to batch update the metadata in comics. With this process, comic books that have an associated metadata source and reference id can have their metadata updated in a batch process. This process runs in the background on the server and is initiated by an administrator.
To begin this process, simply select one or more comic books from any page. Then go to the side navigation bar and select the Update Metadata Process link. In the toolbar for this page there is a start button. Clicking this button, then confirming starting the process, will kick off on the server a batch process. This process will take each comic selected that has a metadata source and reference id and attempts to retrieve the updated metadata for that comic. If successful, it updates the comic book in the database.
The server supports creating external (as opposed to embedded) metadata files. With this, the metadata is written to a separate file from the comic. The contents are identical to the embedded file, but are just externally maintained.
To enable this feature, go to Configuration -> Library and make sure the Create external metadata files for each comic feature is checked.
In addition, if the Do not create or update internal metadata files will not create or change any existing embedded metadata file. If an external one is found, its data will take precedence over the embedded file's contents.