This project is to analysing logs from a dataset to answer 3 questions:
These instructions will get you a copy of the project up and running on your local machine.
To use this project you need to install some software:
After installing the software, you need to download the database file from here.
After downloading the database file newsdata.zip, you need to set up the database prior to
add the VIEWs and prior to running the program. You can set up the database by using this command:
psql news -f newsdata.sql
Then add the views to the database by running the create_views.sql file:
psql news -f newsdata.sql
CREATE VIEW most_viewed_articles AS
SELECT title, count(path) as views
FROM articles
LEFT JOIN log ON path like CONCAT('%', slug)
GROUP BY title
ORDER BY views DESC;
CREATE VIEW most_popular_authors AS
SELECT au.name, sum(m.views) AS views
FROM authors AS au
LEFT JOIN articles AS ar ON au.id = ar.author
JOIN most_viewed_articles AS m on m.title = ar.title
GROUP BY au.name
ORDER BY views DESC;
CREATE VIEW days_with_more_than_1_per_errors AS
SELECT to_char(log.time, 'FMMonth DD, YYYY') AS date, TO_CHAR((count(status_4))::FLOAT / COUNT(*) * 100, '9.0') AS status_4_percent
FROM log
LEFT JOIN (SELECT id, time::DATE AS date, status FROM log WHERE status LIKE '4__ %') AS status_4
ON status_4.id = log.id
GROUP BY TO_CHAR(log.time, 'FMMonth DD, YYYY')
HAVING COUNT(status_4) >= ((COUNT(*)-COUNT(status_4)) * .01);
Finally, you need to run main.py python file by using this command:
python3 main.py
This python file will ask you to choose between 3 different analysis please choose the one you need and test it out. Got a new one? please send a pull request. ^_^
Anas Almohsen - Anas-95 See also the list of contributors who participated in this project.
Hat tip to anyone whose code was used