I recently came across CeWL as a tool for spidering websites to gather keywords into a dictionary list relevant to their fields of expertise. This can aid password attacks by having words relevant to the field of expertise that the companies have.
Only issue was that CeWL cant seem to get behind Cloud Flare sites. This is incredibly annoying… so ive made one that does.
WLGen currently only scans one page but there will be further development, when I have the time, to spider pages. It can however get past Cloud Flare using the CFScrape library.
Check it out here for more https://github.com/secsi/WLGen