-
Notifications
You must be signed in to change notification settings - Fork 353
Comparing changes
Open a pull request
base repository: buriy/python-readability
base: master
head repository: mmirate/python-readability
compare: master
- 8 commits
- 1 file changed
- 1 contributor
Commits on Apr 29, 2013
-
Bolt de-pagination onto CLI entry point.
`readability.main()` now depaginates using a recursive method. This may or may not work for articles with many pages.
Configuration menu - View commit details
-
Copy full SHA for 6ef74c3 - Browse repository at this point
Copy the full SHA 6ef74c3View commit details -
Add encoding declaration to readability/readability.py.
Otherwise Python chokes on the UTF-8 characters in the new regexes.
Configuration menu - View commit details
-
Copy full SHA for 68b819e - Browse repository at this point
Copy the full SHA 68b819eView commit details -
Fix a stupid mistake in readability/readability.py
`lxml.etree.XPath` was not imported properly.
Configuration menu - View commit details
-
Copy full SHA for a7c194a - Browse repository at this point
Copy the full SHA a7c194aView commit details
Commits on May 7, 2013
-
Make the depagination actually work.
* Eliminate numerous careless mistakes. * Fix a silly, random typo from a previous commit. * Make the depaginator recursive like it should be.
Configuration menu - View commit details
-
Copy full SHA for 1c09e3c - Browse repository at this point
Copy the full SHA 1c09e3cView commit details
Commits on May 8, 2013
-
Make various improvements and fixes.
Depagination: * Activate only when new CLI option `-p` is given. * Remove several lines of dead code. * Restrict the list of possible links to the next page such that the URLs must contain as their protocol field either http or https. Full HTML document (i.e. non-fragment) output: * Include a meta-tag encoding declaration. * Include the shortened title in html>head>title.
Configuration menu - View commit details
-
Copy full SHA for 95fa0a5 - Browse repository at this point
Copy the full SHA 95fa0a5View commit details -
Fix the URL filtering for the depaginator.
Previously, any URLs that were not protocol-absolute would be rejected. Now, only the trivial pathology of the "javascript" URL protocol is rejected. Note that other pathological cases may exist.
Configuration menu - View commit details
-
Copy full SHA for 170955a - Browse repository at this point
Copy the full SHA 170955aView commit details -
Disallow infinite depagination recursion.
Should infinite recursion be a sign that some pages are faulty? Which ones? For the record, the website that triggered the need for restrictions on depagination was http://www.lifehacker.com since that site now has a link whose text reads "Learn More »" [they had a redesign to which that text refers]; "»" is the "Right Angle QUOte" that is present in one of the regexes.
Configuration menu - View commit details
-
Copy full SHA for bf7b59f - Browse repository at this point
Copy the full SHA bf7b59fView commit details -
Fix another batch of silly mistakes.
* Support non-UTF8 encodings by actually passing the `encoding` parameter from `main()` to `summary()`. * Compose depaginated articles together properly and intelligently. * Give the recursive depagination function a more proper name.
Configuration menu - View commit details
-
Copy full SHA for dc3e0cd - Browse repository at this point
Copy the full SHA dc3e0cdView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff master...master