Forget e-mail spamming for a moment, there are a lot of other things that you can do with these tools. e.g Scraping ebay.in's facebook community 'wall' for all the posts. The following python code shows how this can be done with the help of BeautifulSoup.
If you observe the source code for web page, you'll see that each and every post on the wall is under the tag span class="UIStory_Message. So we have to parse the page for finding out all the 'span' tags, which have the 'class' attribute set to 'UIStory_Message'. The method 'bs.findAll', shown below, does exactly that. Also we may want to print the names of the post's author before every post. From the HTML source, you can see that these names are available as text under the tag span class="UIIntentional_StoryNames". This tag is just before our post tag span class="UIStory_Message. As we already have references to all the post tags, we can find out the name tags by calling 'findPreviousSibling' and the name is available one level deep, under the 'a' tag. Finally, we can call 'getText' method to get the name of the post's author.
If this is confusing, please see the official BeautifulSoup documentation
#!/usr/bin/env python
__author__ = "Kasi Viswanadh Yakkala"
import os
import re
import sys
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
def ebayin_fb_parse():
frontpage = urlopen("http://www.facebook.com/ebaydotin?v=wall").read()
bs = BeautifulSoup(frontpage)
fbstories = bs.findAll(name='span', attrs={'class':'UIStory_Message'})
for s in fbstories:
fbprofile_name = s.findPreviousSibling(name='span',attrs={'class':'UIIntentionalStory_Names'}).a.getText()
print fbprofile_name,':'
try:
print s.getText()
except:
print s.find(text=True)
# HOW TO USE
""" Main Function """
if __name__ == "__main__":
ebayin_fb_parse()
3 comments:
I imagine people will want to know what website scraping software is?
Check out this series of posts is dedicated to executives taking charge of projects that entail scraping information from one or more websites.
http://www.fornova.net/blog/?p=4
You should try ScrapePro Web Scraper Designer.
Hello All,
Web Content Extractor is the most powerful and easy-to-use data extraction software for web scraping and data extraction from the websites. Web scraping is a method of pulling information from the seemingly infinite number of locations on the web where it is stored. I really like what you have going here. Lots of information on a lot of subjects that I find interesting. Thank you...........
Web Scraping Tool
Post a Comment