Easy one for those who know regex's well! If I have a big long string that basically contains a messy HTML page, I want to strip out:
1: All HTML comments
2: All javascript comments
3: All whitespace than can be removed
.. without breaking anything. It sounds easy enough, but there's some odd quirks that make it more challenging than I want to handle :) See the full spec for the way I see it...
Cheers, C
## Deliverables
Easy one for those who know regex's well! If I have a big long string that basically contains a messy HTML page, I want to strip out:
1: All HTML comments
2: All javascript comments
3: All whitespace than can be removed
Here's the HTML comments regex:
preg_replace('/;/', '', $buffer);
Problem 1: If you follow that and you know HTML, you'll know that it's great unless you're inside a tag pair that uses comments to protect its content from older browser that don't understand (most common I guess being {script} and {style} tags.
Problem 2: Javascript comments that might appear after a line of code, not on their own - eg:
var problemVar; // this variable is commented
Problem 3: Whitespace that appears inside {pre} tags needs to remain untouched (I don't know if there's any others like that).
Those are the problems that I can think of, there might be more - notice the brief says "without breaking anything"! :)
Nice regex problems if anyone fancies them anyway :)
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Complete ownership and distribution copyrights to all work purchased.
## Platform
PHP / Linux / use PCRE's, not PHP's own regex stuff