You can use wget to generate a list of the URLs on a website.
Spider example.com, writing URLs to urls.txt, filtering out common media files (css, js, etc..):
wget --spider -r http://www.example.com 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\|JPG\)$' > urls.txt
Note that this gives a list that duplicates URLs.
If you mirror instead of spider you seem to get a more comprehensive list without duplicates:
wget -m http://www.example.com 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\|JPG\)$' > urls.txt
This will download all pages of the site into a directory with the same name as the domain.