http - How to curl or wget a web page?


Translate

I would like to make a nightly cron job that fetches my stackoverflow page and diffs it from the previous day's page, so I can see a change summary of my questions, answers, ranking, etc.

Unfortunately, I couldn't get the right set of cookies, etc, to make this work. Any ideas?

Also, when the beta is finished, will my status page be accessible without logging in?


Toutes les réponses
  • Translate

    Your status page is available now without logging in (click logout and try it). When the beta-cookie is disabled, there will be nothing between you and your status page.

    For wget:

    wget --no-cookies --header "Cookie: soba=(LookItUpYourself)" https://stackoverflow.com/users/30/myProfile.html
    

  • Translate

    From Mark Harrison

    And here's what works...

    curl -s --cookie soba=. https://stackoverflow.com/users

    And for wget:

    wget --no-cookies --header "Cookie: soba=(LookItUpYourself)" https://stackoverflow.com/users/30/myProfile.html
    

  • Translate

    Nice idea :)

    I presume you've used wget's

    --load-cookies (filename)
    

    might help a little but it might be easier to use something like Mechanize (in Perl or python) to mimic a browser more fully to get a good spider.


  • Translate

    I couldn't figure out how to get the cookies to work either, but I was able to get to my status page in my browser while I was logged out, so I assume this will work once stackoverflow goes public.

    This is an interesting idea, but won't you also pick up diffs of the underlying html code? Do you have a strategy to avoid ending up with a diff of the html and not the actual content?


  • Translate

    And here's what works...

    curl -s --cookie soba=. http://stackoverflow.com/users