Article

Automatically extract clean article text and other data from news articles, blog posts and other text-heavy pages.

Query Params
string
required
Defaults to https://www.technologyreview.com/2020/09/04/1008156/knowledge-graph-ai-reads-web-machine-learning-natural-language-processing/

Target URL to extract

string
enum

Specify optional fields to be returned from any fully-extracted pages (e.g. fields=querystring,links)

Allowed:
int32

Sets a value in milliseconds to wait for the retrieval/fetch of content from the requested URL. The default timeout for the third-party response is 30 seconds (30000).

string

Use for jsonp requests. Needed for cross-domain ajax.

string

Specify an IP address of a custom proxy that will be used to fetch the target page. (Ex: &proxy or &proxy=0.0.0.0)

string

Used to specify the authentication parameters that will be used with a custom proxy specified in the &proxy parameter. (Ex: proxyAuth=username:password)

string

Set to default to use Diffbot's datacenter proxy for this request. none will instruct Extract to not use proxies, even if proxies have been enabled for this particular URL globally.

boolean

Pass paging=false to disable automatic concatenation multiple-page articles.

int32

Set the maximum number of automatically-generated tags to return. (Default: 10)

float

Set the minimum relevance score of tags to return, between 0.0 and 1.0. By default only tags with a score equal to or above 0.5 will be returned.

float

Set the minimum relevance score of categories to return, between 0.0 and 1.0. By default only categories with a score equal to or above 0.5 will be returned.

boolean

Pass discussion=false to disable automatic extraction of article comments.

string
enum

Run extracted text and title through the Diffbot Natural Language API. Example: &naturalLanguage=entities,facts,categories,sentiment.

int32

Sets the maximum number of sentences for summary generation when using naturalLanguage=summary (Default: 3).

integer
≤ 180000

Add additional time for rendering before the page is closed and the DOM is extracted. This can cause page timeouts, so a timeout parameter may be needed to extend the timeout. Note that the renderer closes automatically at 180 seconds.

string
enum

Direct the browser to scroll down the page, to trigger lazy-loaded content.

Allowed:
Responses

Language
Credentials
Query
Response
Click Try It! to start a request and see the response here! Or choose an example:
application/json