View on GitHub

Web

XLinq to Web

Icon HTML => XML + CSS with XLinq 🤘

EULA OSS Version Downloads

Open Source Maintenance Fee

To ensure the long-term sustainability of this project, users of this package who generate revenue must pay an Open Source Maintenance Fee. While the source code is freely available under the terms of the License, this package and other aspects of the project require adherence to the Maintenance Fee.

To pay the Maintenance Fee, become a Sponsor at the proper OSMF tier. A single fee covers all of Devlooped packages.

Read HTML as XML and query it with CSS over XLinq (or HtmlAgilityPack killer 😉). Provides HtmlDocument.Load and CssSelectElement(s) extension methods for XDocument/XElement.

No need to learn an entirely new object model for a page 🤘. This makes it the most productive and lean library for web scraping using the latest and greatest that .NET can offer.

Usage

using System.Xml.Linq;
using Devlooped.Web;

XDocument page = HtmlDocument.Load("page.html")
IEnumerable<XElement> elements = page.CssSelectElements("div.menuitem");

XElement title = page.CssSelectElement("html head meta[name=title]");

By default, HtmlDocument.Load will skip non-content elements script and style, turn all element names into lower case, and ignore all XML namespaces (useful when loading XHTML, for example) for easier querying. These options as well as granular whitespace handling can be configured using the overloads receiving an HtmlReaderSettings.

The underlying parsing is performed by the amazing SgmlReader library by Microsoft’s Chris Lovett.

In addition, the following extension methods make it easier to work with XML documents where you want to query with CSS or XPath without having to deal with XML namespaces:

using System.Xml;
using System.Xml.Linq;
using Devlooped.Web;

var doc = XDocument.Load("doc.xml")
// Will remove all xmlns declarations, and allow querying elements 
// as if none had namespaces, returns the root element
XElement nons = doc.RemoveNamespaces();

// Alternatively, you can also ignore at the XmlReader level
using var reader = XmlReader.Create("doc.xml").IgnoreNamespaces();
doc = XDocument.Load(reader);

// Finally, you can also skip elements at the reader level
using var reader = XmlReader.Create("doc.xml").SkipElements("foo", "bar");
doc = XDocument.Load(reader);

CSS

At the moment, supports the following CSS selector features:

And all combinators

Non-CSS features:

Dogfooding

CI Version Build

We also produce CI packages from branches and pull requests so you can dogfood builds as quickly as they are produced.

The CI feed is https://pkg.kzu.app/index.json.

The versioning scheme for packages is:

Sponsors

Clarius Org MFB Technologies, Inc. SandRock DRIVE.NET, Inc. Keith Pickford Thomas Bolon Kori Francis Uno Platform Reuben Swartz Jacob Foshee Jonathan Ken Bonny Simon Cropp agileworks-eu Zheyu Shen Vezel ChilliCream 4OTC domischell Adrian Alonso Michael Hagedorn torutek mccaffers Seika Logiciel Andrew Grant

Sponsor this project

Learn more about GitHub Sponsors