markup - Is it acceptable for invalid XHTML?


I've noticed a lot of sites, SO included, use XHTML as their mark-up language and then fail to adhere to the spec. Just browsing the source for SO there are missing closing tags for paragraphs, invalid elements, etc.

So should tools (and developers) use the XHTML doctype if they are going to produce invalid mark up? And should browsers be more firm in their acceptance of poor mark-up?

And before anyone shouts hypocrite, my blog has one piece of invalid mark-up involving the captha (or it did the last time I checked) which involves styling the noscript tag.

  • Translate

    There are many reasons to use valid markup. My favorite is that it allows you to use validation as a form of regression testing, preventing the markup equivalent of "delta rot" from leading to real rendering problems once the errors reach some critical mass. And really, it's just plain sloppy to allow "lazy" errors like typos and mis-nested/unclosed tags to accumulate. Valid markup is one way to identify passionate programmers.

    There's also the issue of debugging: valid markup also gives you a stable baseline from which to work on the inevitable cross-browser compatibility woes. No web developer who values his time should begin debugging browser compatibility problems without first ensuring that the markup is at least syntactically valid—and any other invalid markup should have a good reason for being there.

    (Incidentally, fails both these tests, and suggestions to fix the problems were declined.)

    All of that said, to answer your specific question, it's probably not worthwhile to use one of the XHTML doctypes unless you plan to produce valid (or at least well-formed) markup. XHTML's primary advantages are derived from the fact that XHTML is XML, allowing it to be processed and transformed by tools and technologies that work with XML. If you don't plan to make your XHTML well-formed XML, then there's little point in choosing that doctype. The latest HTML 4 spec will probably do everything you need, and it's much more forgiving.

  • Translate

    We should always try to make it validate according to standards. We'll be sure that the website will display and work fine on current browsers AND future browsers.

  • Translate

    I don't think that, if you specify a doctype, there is any reason not to adhere to this doctype.

    Using XHTML makes automated error detection easy, every change can be automatically checked for invalid markup. This prevents errors, especially when using automatically generated content. It is really easy for a web developer using a templating engine (JSP, ASP.NET StringTemplate, etcetera) to copy/paste one closing tag too little or too many. When this is your only error, it can be detected and fixed immediately. I once worked for a site that had 165 validation errors per page, of which 2 or 3 were actual bugs. These were hard to find in the clutter of other errors. Automatic validation would have prevented these errors at the source.

    Needless to say, choosing a standard and sticking to it can never benefit interoperability with other systems (screen scrapers, screen readers, search engines) and I have never come across a situation where a valid semantic XHTML with CSS solution wasn't possible for all major browsers.

    Obviously, when working with complex systems, it's not always possible to stick to your doctype, but this is mostly a result of improper communication between the different teams developing different parts of these systems, or, most likely, legacy systems. In the last case it's probably better to isolate these cases and change your doctype accordingly.

    It's good to be pragmatic and not adhere to XHTML just because someone said so, regardless of costs, but with current knowledge about CSS and browsers, testing and validation tools, most of the time the benefits are much greater than the costs.

  • Translate

    You can say that I have an OCD on XHTML validity. I find that most of the problems with the code not being valid comes from programmers not knowing the difference between HTML and XHTML. I've been writing 100% valid XHTML and CSS or a while now and have never had any major rendering problems with other browsers. If you keep everything valid, and don't try anything too exotic css wise, you will save yourself a ton of time in fixes.

  • Translate

    I wouldn't use XHTML at all just to save myself the philosophical stress. It's not like any browsers are treating it like XHTML anyway.

    Browsers will reject poor mark-up if the page is sent as application/xhtml+xml, but they rarely are. This is fine.

    I would be more concerned about things like inline use of CSS and JavaScript with Stack Overflow, just because they make maintenance harder.

  • Translate

    Though I believe in striving for valid XHTML and CSS, it's often hard to do for a number of reasons.

    • First, some of the content could be loaded via AJAX. Sometimes, fragments are not properly inserted into the existing DOM.
    • The HTML that you are viewing may not have all been produced in the same document. For example, the page could be made of up components, or templates, and then thrown together right before the browser renders it. This isn't an excuse, but you can't assume that the HTML you're seeing was hand coded all at once.
    • What if some of the code generated by Markdown is invalid? You can't blame Stack Overflow for not producing valid code.
    • Lastly, the purpose of the DOCTYPE is not to simply say "Hey, I'm using valid code" but it's also to give the browser a heads up what you're trying to do so that it can at least come close to correctly parsing that information.

    I don't think that most developers specify a DOCTYPE and then explicitly fail to adhere to it.

  • Boyd Lee

    while I agree with the sentiment of "if it renders fine then don't worry about it" statement, however it's good for follow a standard, even though it may not be fully supported right now. you can still use Table for layout, but it's not good for a reason.

  • Translate

    No, you should not use XHTML if you can't guarantee well-formedness, and in practice you can't guarantee it if you don't use XML serializer to generate markup. Read about producing XML.

    Well-formedness is the thing that differentiates XHTML from HTML. XHTML with "just one" markup error ceases to be XHTML. It has to be perfect every time.

    If "XHTML" site appears to work with some errors, it's because browsers ignore the DOCTYPE and interpret page as HTML.

    See XHTML proxy that forces interpretation of pages as XHTML. Most of the time they fail miserably. This is one of the reason why future of XHTML is uncertain and why development of HTML has been resumed.

  • Translate

    It depends. I had that issue with my blog where a YouTube video caused invalid XHTML, but it rendered fine. On the other hand, I have a "Valid XHTML" link, and a combination of a "Valid XHTML" claim and invalid XHTML is not professional.

    As SO does not claim to be valid, I think it's acceptable, but personally if I were Jeff i would be bothered and try to fix it even if it looks good in modern browsers, but some people rather just move on and actually get things done instead of fixing non-existent bugs.

  • Translate

    So long as it works in IE, FF, Safari, (insert other browser here) you should be okay. Validation isn't as important as having it render correctly in multiple browsers. Just because it is valid, doesn't mean it'll work in IE properly, for instance.

    Run Google Analytics or similar on your site and see what kind of browsers your users are using and then judge which browsers you need to support the most and worry about the less important ones when you have the spare time to do so.

  • Translate

    I say, if it renders OK, then it doesn't matter if it's pixel perfect.

    It takes a while to get a site up and running the way you want it, going back and making changes is going to change the way the page renders slightly, then you have to fix those problems.

    Now, I'm not saying you should built sloppy web pages, but I see no reason to fix what ain't broke. Browsers aren't going to drop support for error correction anytime in the near future.

  • Translate

    I don't understand why everyone get caught up trying to make their websites fit the standard when some browsers sill have problems properly rendering standard code. I've been in web design for something like 10 years and I stopped double codding (read: hacking css), and changing stupid stuff just so I could put a button on my site.

    I believe that using a < div> will cause you to be invalid regardless, and it get a bit harder to do any major JavaScript/AJAX without it.

  • Translate

    There are so many standards and they are so badly "enforced" or supported that I don't think it matters. Don't get me wrong, I think there should be standards but because they are not enforced, nobody follows them and it's a massive downward spiral.

  • Translate

    For 99.999% of the sites out there, it really won't matter. The only time I've had it matter, I ran the HTML input through HTMLTidy to XHTML-ize it, and then ran my processing on it.

    Pretty much, it's the old programmer's axiom: trust no input.