SecLists

seclists_reply_parse.py

Parses seclists.org raw.html files into the following subfiles:

.reply.body.txt

Full content of reply (without seclists.org wrapper page html)
.reply.title_body.txt

Title of reply + full content of reply
.reply.body_no_signature.txt

Full content of reply, with attempt to strip out signature
.reply.title_body_no_signature.txt

Title of reply + above
.reply.body_tags.txt

File containing analysis of tags in raw.html file. Content is in JSON format.
```
  * tags: html tag types found in reply, along with count
  * sites: domains of sites referenced in reply, along with count
```
Example: {"tags": {"pre": 2, "a": 1}, "sites": {"pentestmag.com": 1}}

Args

-d <directory>, parse entire directory, e.g., -d ./2011_01

-f <filename>, parse single raw file, e.g. -f ./2011_Jan_0.raw.html

Example usage: $ python seclists_reply_parse.py -d ./2011_01

Library

For more flexiblity, import this library, and use the following functions:

`parse_month_folder(path)`

Parse .raw.html files

Args:

path: str, directory containing .raw.html files

`parse_reply(filename)`

Parse individual message.

Args:

filename: str

seclists_index_parse.py

Parses month index raw.html into csv file. This also pulls data from the referenced replies, to obtain full date and author information.

CSV Format

The CSV file contains five columns:

id
title: Subject of reply
date: e.g. 2005-01-05T00:53:02+00:00 format
author: Name and email, as supplied by author
parent: the id of the parent thread email; blank if this is a parent thread

Args

-f <filename>, parse single raw file, e.g. -f ./2011_Jan_0.raw.html

Example usage: $ python seclists_index_parse.py -f ./2011_Jan_0.raw.html

Name		Name	Last commit message	Last commit date
parent directory ..
Index-Parse.ipynb		Index-Parse.ipynb
README.md		README.md
Reply-Parse.ipynb		Reply-Parse.ipynb
seclists_index_parse.py		seclists_index_parse.py
seclists_reply_parse.py		seclists_reply_parse.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

seclists_reply_parse.py

Args

Library

`parse_month_folder(path)`

`parse_reply(filename)`

seclists_index_parse.py

CSV Format

Args

FilesExpand file tree

SecLists

Directory actions

More options

Directory actions

More options

Latest commit

History

SecLists

Folders and files

parent directory

README.md

seclists_reply_parse.py

Args

Library

parse_month_folder(path)

parse_reply(filename)

seclists_index_parse.py

CSV Format

Args

`parse_month_folder(path)`

`parse_reply(filename)`