javascript - How can I turn a string of HTML into a DOM object in a Firefox extension?


Translate

I'm downloading a web page (tag soup HTML) with XMLHttpRequest and I want to take the output and turn it into a DOM object that I can then run XPATH queries on. How do I convert from a string into DOM object?

It appears that the general solution is to create a hidden iframe and throw the contents of the string into that. There has been talk of updating DOMParser to support text/html but as of Firefox 3.0.1 you still get an NS_ERROR_NOT_IMPLEMENTED if you try.

Is there any option besides using the hidden iframe trick? And if not, what is the best way to do the iframe trick so that your code works outside the context of any currently open tabs (so that closing tabs won't screw up the code, etc)?

This is an example of why I'm looking for a solution other than the iframe hack, if I have to write all that code to have a robust solution, then I'd rather keep looking for something else.


All Answers
  • Translate

    Ajaxian actually had a post on inserting / retrieving html from an iframe today. You can probably use the js snippet they have posted there.

    As for handling closing of a browser / tab, you can attach to the onbeforeunload (http://msdn.microsoft.com/en-us/library/ms536907(VS.85).aspx) event and do whatever you need to do.


  • Translate

    Try this:

    var request = new XMLHttpRequest();
    
    request.overrideMimeType( 'text/xml' );
    request.onreadystatechange = process;
    request.open ( 'GET', url );
    request.send( null );
    
    function process() { 
        if ( request.readyState == 4 && request.status == 200 ) {
            var xml = request.responseXML;
        }
    }
    

    Notice the overrideMimeType and responseXML.
    The readyState == 4 is 'completed'.


  • Translate

    Try creating a div

    document.createElement( 'div' );
    

    And then set the tag soup HTML to the innerHTML of the div. The browser should process that into XML, which then you can parse.

    The innerHTML property takes a string that specifies a valid combination of text and elements. When the innerHTML property is set, the given string completely replaces the existing content of the object. If the string contains HTML tags, the string is parsed and formatted as it is placed into the document.


  • Translate

    So you want to download a webpage as an XML object using javascript, but you don't want to use a webpage? Since you have no control over what the user will do (closing tabs or windows or whatnot) you would need to do this in like a OSX Dashboard widget or some separate application. A Firefox extension would also work, unless you have to worry about the user closing the browser.


  • Translate

    Is there any option besides using the hidden iframe trick?

    Unfortunately, no, not now. Otherwise the microsummary code you point to would use it instead.

    And if not, what is the best way to do the iframe trick so that your code works outside the context of any currently open tabs (so that closing tabs won't screw up code, etc)?

    The code you quoted uses the recent browser window, so closing tabs won't affect parsing. Closing that browser window will abort your load, but you can deal with it (detect that the load is aborted and restart it in another window for example) and it doesn't happen very often.

    You need a DOM window for the iframe to work properly, so there's no clean solution at the moment (if you're keen on using the mozilla parser).