Have written a parser for Sophos Web Security Appliance as its not yet supported by HP ArcSight readymade connectors.
Posting here so that it can be used/improvised by the Dev community.
Check the following thread for sample raw log:
I have been asked why I have not further tokenized after file size. My understanding: Sophos should check the sanity of the URL length before placing it in the logs. For example, ArcSight agent discards below log sent by Sophos as it does not conform to the RFC recommended syslog size due to the ref parameter being big enough to escape the scrutiny:
h=10.0.0.1: u="-" s=200 X=+ t=1435163347 T=10422 Ts=0 act=1 cat="0x2200000002" rsn=- threat="-" type="text/plain" ctype="text/html" sav-ev=n/a sav-dv=n/a uri-dv=- cache=- in=13423 out=770 meth=POST ref="hxxp://xym1.ib.adnxs.com/if?e=wqT_3QLDKMg6FAAAAgDWAAUIzrWrrAUQtYbip7bA64QDGImumuiY-KGeGSABKi0JAAAAAAAA8D8RMzMBAgjrPxkFEQwA8D8hARAQMzPrPykREqgw0ZazAjiPCEDACEhSULXkhA5YucMnYABopuwDcAB45y-AAQGKAQNVU0SSBQb0JAGYAawCoAH6AagBALABALgBAsABBcgBAtABANgBAOABAPABAKoCCGczZ2d3cm4xygLQBGh0dHA6Ly91c2VmYi5hZHNydnIub3JnL2JpZC9mZWVkYmFjay9
Agent throws below warnings for such logs:
-> did not match the common regular expression
-> Received an unexpected event from parser
Based on sampling, I have stuck to tokenization only till file size to capture maximum logs matching the regex. As of now, for my environment I don't see any significance for other parameters in flexstring1 except target IP which I could not tokenise due to aforementioned problem. I have better left it to agent to fill the target IP on its own by doing a reverse lookup of dom. If needed, I may sort the help of sub-messages to further tokenize logs without getting discarded due to no regex match.