Parse Httpd Apache Logs To Sql 10.10.08
Hey guys,
I have an apt repo which gets roughly 50 million monthly downloads of packages. I currently have a script which syncs the AWStats log to my other servers, then reads from it after it gets synced to output total stats (http://apt.modmyrepo.com/).
This solution doesn’t scale well though, as a week into the month we’re getting logfiles which are 2-3GB and they’re nearly 10GB by the end of the month – takes 45 minutes to sync that to my other boxes.
What I’d like to do instead is this:
Have a script which parses the httpd logs directly and writes the necessary info to a SQL db. Then, the script on the actual visible page simply reads from that db, providing instant, realtime stats. I have an existing Perl script that creates the page at apt.modmyrepo.com. I’d like to keep it, slightly modified to read from the db instead of the logfiles. That’s part of this job too.
Let me know what you need.


















