Home Articles FAQs XREF Games Software Instant Books About Feedback Search Site-Map
irt.org logo

Q1318 Is there a way of screen scrapping the information on a html page, using JavaScript?

irt.org | Knowledge Base | JavaScript | Files | Q1318 [ previous next ]

Q1318 Is there a way of screen scrapping the information on a html page, using JavaScript?

In Internet Explorer use:

<iframe frameborder=0 width="0" height="0" marginheight=0 marginwidth=0 NAME="iframe" scrolling=no src="page_to_be_scrapped.htm"></iframe>

<script language="JavaScript"><!--
if (window.frames.length > 0) {
    alert(window.frames['iframe'].document.body.innerHTML);
}
//--></script>

In Netscape Navigator go to http://jshelper.pharlap.com and follow instructions for the server side assists:

<html>
<head>
<title></title>
<script language="JavaScript" src="http://jshelper.pharlap.com/netutils/httpget.js?http://www.nytimes.com/"></script>

<script language="JavaScript" type="text/javascript">
function scrapeHeadlines() {
    var searchStart = "<NYT_HEADLINE>";
    var searchEnd = "</NYT_HEADLINE>";
    aNews=FileContents.split(searchStart);
    for (i=1;i<aNews.length;i++) {
        aHeadlineOnly=aNews[i].split(searchEnd);
        document.write(aHeadlineOnly[0]);
    }
}
//--> </script>
</head>

<body onLoad="scrapeHeadlines()">

<b><u>The headlines are:</u></b><br><br>

</body>
</html>

or by using a signed script and LiveConnect:

<script language="JavaScript" type="text/javascript">
function fetchURL(url) {
    if ((location.host == '' && url.indexOf(location.protocol) == -1)  ||
       url.indexOf(location.host) == -1) {
        netscape.security.PrivilegeManager.enablePrivilege('UniversalConnect');
    }
    var dest = new java.net.URL(url);
    var dis = new java.io.DataInputStream(dest.openStream());
    var res = '';
    while ((line = dis.readLine()) != null) {
        res += line;
        res += java.lang.System.getProperty('line.separator');
    }
    dis.close();
    return res;
}

alert(fetchURL(location.href));
//--> </script>

But it needs to be signed or otherwise trusted for locations other than the one the script is loaded from.


Provide feedback ...
AddThis Social Bookmark Button

Provide feedback ... AddThis Social Bookmark Button


Last Updated: 30th March 2008. Maintained by: Martin Webb and Michel Plungjan
irt.org liability, trademark, document use, privacy statement and software licensing rules apply.
Copyright © 1996-2008 irt.org, All Rights Reserved.