Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: buriy/python-readability
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: master
Choose a base ref
...
head repository: mmirate/python-readability
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 8 commits
  • 1 file changed
  • 1 contributor

Commits on Apr 29, 2013

  1. Bolt de-pagination onto CLI entry point.

    `readability.main()` now depaginates using a recursive method.
    
    This may or may not work for articles with many pages.
    mmirate committed Apr 29, 2013
    Configuration menu
    Copy the full SHA
    6ef74c3 View commit details
    Browse the repository at this point in the history
  2. Add encoding declaration to readability/readability.py.

    Otherwise Python chokes on the UTF-8 characters in the new regexes.
    mmirate committed Apr 29, 2013
    Configuration menu
    Copy the full SHA
    68b819e View commit details
    Browse the repository at this point in the history
  3. Fix a stupid mistake in readability/readability.py

    `lxml.etree.XPath` was not imported properly.
    mmirate committed Apr 29, 2013
    Configuration menu
    Copy the full SHA
    a7c194a View commit details
    Browse the repository at this point in the history

Commits on May 7, 2013

  1. Make the depagination actually work.

    * Eliminate numerous careless mistakes.
    * Fix a silly, random typo from a previous commit.
    * Make the depaginator recursive like it should be.
    mmirate committed May 7, 2013
    Configuration menu
    Copy the full SHA
    1c09e3c View commit details
    Browse the repository at this point in the history

Commits on May 8, 2013

  1. Make various improvements and fixes.

    Depagination:
    * Activate only when new CLI option `-p` is given.
    * Remove several lines of dead code.
    * Restrict the list of possible links to the next page
      such that the URLs must contain as their protocol field
      either http or https.
    
    Full HTML document (i.e. non-fragment) output:
    * Include a meta-tag encoding declaration.
    * Include the shortened title in html>head>title.
    mmirate committed May 8, 2013
    Configuration menu
    Copy the full SHA
    95fa0a5 View commit details
    Browse the repository at this point in the history
  2. Fix the URL filtering for the depaginator.

    Previously, any URLs that were not protocol-absolute
    would be rejected. Now, only the trivial pathology of
    the "javascript" URL protocol is rejected.
    
    Note that other pathological cases may exist.
    mmirate committed May 8, 2013
    Configuration menu
    Copy the full SHA
    170955a View commit details
    Browse the repository at this point in the history
  3. Disallow infinite depagination recursion.

    Should infinite recursion be a sign that some pages are faulty?
    Which ones?
    
    For the record, the website that triggered the need for restrictions
    on depagination was http://www.lifehacker.com since that site now
    has a link whose text reads "Learn More »"
    [they had a redesign to which that text refers]; "»" is the
    "Right Angle QUOte" that is present in one of the regexes.
    mmirate committed May 8, 2013
    Configuration menu
    Copy the full SHA
    bf7b59f View commit details
    Browse the repository at this point in the history
  4. Fix another batch of silly mistakes.

    * Support non-UTF8 encodings by actually passing the `encoding`
      parameter from `main()` to `summary()`.
    * Compose depaginated articles together properly and intelligently.
    * Give the recursive depagination function a more proper name.
    mmirate committed May 8, 2013
    Configuration menu
    Copy the full SHA
    dc3e0cd View commit details
    Browse the repository at this point in the history
Loading