Category Archives: Programming

DataConnect.be blog is up.

I’ve set up a new blog connected to my programming efforts. It’s on DataConnect.be, the website I set up to store any tools/programs I making.

It’s a bit empty right now, and several links are still “dead”, like for example the demo, but I’m going to be adding more blog posts there that are program-related. I’ll note here any new posts that appear there.

On the progress front :

  • I’ve made some progress on following keywords, storing them, and collecting twitter user information. Everything is running stable on my laptop, in text version.
  • JQuery and JQuery-UI still rock, I’ve now found out how modal dialog boxes can be called up in a ajaxy way, which is really cool ! Coupled with cherrypy, I can initiate a dialog box that calls a page that does the db update and shows the results… really nice and much more user intuitive than being thrown on another page that tells you it’s done and you need to click “return” to see what changed.
  • in the following weeks I’ll be setting up a database, some keywords and a cron job on my webfaction host – I’ll need to see how they behave and if there are problems or tweaking to be done
  • still to do: setting up the graphics to show what I’m collecting, after all, text is not sexy.
  • after that, it’ll be writing the demo pages so that at least I can show what I’m doing
  • after that, the scary part comes – asking for feedback… :-)
  • further than that I can’t or won’t look at the moment

I also discovered quite a few programs that analyse twitter, and for some time I got into a funk about one especially called hootsuit. It is really very very good ! It’s also very free, and it’s a bit worrisome to my idea of trying to gain money from my twitter analysis program if there are such good and great programs that are free. Obviously, a lot of care went into that program, and it’s very smooth and intuitive to use. Higly recommended to get a good overview of your tweets and following up on who said what. There are other programs out there (Social Oomp is one) that I admire (and which does ask money for the pro version).

That said, my focus lies on the keywords and not on the individual tweets. What I want to implement is different from what they are doing, so I still see added value for what I’m doing.

Using CherryPy for webform authentication

If you are using CherryPy, I can recommend the webform-based authentication that Arnar Birgisson wrote for ease of use and extensability.

After trying out the included authentication models with CherryPy (I’m using 3.1.2, the last stable version at the moment of writing), I was disappointed in the results. Then I stumbled over a recommandation from someone on Nabble, a web-based programmar’s discussion forum, which pointed to the following wiki page on the CherryPy site:

http://tools.cherrypy.org/wiki/AuthenticationAndAccessRestrictions

The complete program code plus examples are on the page and are well explained.

You can have a skeleton login system (using a hardcoded dictionary) up and running in literally half an hour !

  • Just copy/paste the code on the page and save it as auth.py in your cherrypy script dir.
  • Add the hardcoded dictionary containing username and passwords to it (or script the db access, see the example included)
  • Put ‘require()’ everywhere on your cherrypy pages that need to have login protection – additionally, you can also have roles so that only admins can access certain pages.

Early last week I replaced that hardcoded dictionary and built the db lookup query for the login. Once that was working, I added a ‘my profile’ page to the application I’m working on.  Then I thought it would be nice for the admin to have a ‘create user’ form in the admin section to add users. Done that as well, using the jquery-ui to create tabs and seperate content in the admin section.

All in all, a nice week of nice work.

I’m starting to think this might make it’s way to my hosting server one of the coming weeks…  although I need to do some more work on showing the user only his keywords and not all the keywords, as well as doing something with the keywords to use them better.

Oh and one more thing: this works better under SSL than in the clear http: sky !

threads

Simple Python threading.Thread example using Queue

I managed to write a really simple example of using threads in Python that I hope will give more insight on how to adapt my other programming stuff. And re-use this later on, in case I need to revisit this again it would be handy not to scour the internet again to assemble the bits and pieces of threading with Python.

The example below uses 3 threads, and processes 10 pairs of numbers (tuples) that I put in a list.

# Our list of work todo
inputlist_ori = [ (5,5),(10,4),(78,5),(87,2),(65,4),(10,10),(65,2),(88,95),(44,55),(33,3) ]

Those numbers are divided over those 3 threads by the Queue system.

The Queue system itself is limited to 5 slots, although this could easily be changed to more or less. You will notice in the console print that the message “Waiting for threads to finish.” appears after the fifth result, indicating that the queues are being used and the main program has continued on.

After putting everything in the queue system, the program waits for the threads to finish using the .join() function.

All spawned threads keep on being active, running forever, accepting jobs – that is, until the queue is empty, at which point they shut down.

I based most of my simple example on the examples in the Python threading tutorial (.pdf) work of Norman Matloff and Francis Hsu that I referenced before in a previous blog post. However, while their examples undoubtedly do more and are more extensive, they are also more complex. This example is deliberately made as simple as possible so to understand the basic principles of threading and the queue system.

Things I stumbled over:

  • Duh! You spawn the threads before you fill up the queues with stuff todo…
  • When printing out things to the console or python shell, things got jumbled because different threads took over from each other – to solve that I used the threading.Lock().acquire() and threading.Lock().release() to make sure that a thread could finish printing. Not sure if I understand completely all the possibilities this offers.
  • Still a bit stumped on getting more info, name, etc on the thread that is running at the moment – haven’t figured that out yet how to do that.

Feel free to comment and ask questions – if you can improve this program, please let me know !

# threading test
# Alex Boschmans
# www.boschmans.net
# January 2010

#
# IMPORT SECTION
#
import threading, Queue

#
# Variables setup
#
THREAD_LIMIT = 3                # This is how many threads we want
jobs = Queue.Queue(5)           # This sets up the queue object to use 5 slots
singlelock = threading.Lock()   # This is a lock so threads don't print trough each other (and other reasons)

# Our list of work todo
inputlist_ori = [ (5,5),(10,4),(78,5),(87,2),(65,4),(10,10),(65,2),(88,95),(44,55),(33,3) ]

#
# This is called from the main function
# It spawns the threads, fills up the queue with work items that the threads will use
# And then waits for the threads to finish
# This could use some more try:except code...
#
def draadje(inputlist):
    print "Inputlist received..."
    print inputlist

    # Spawn the threads
    print "Spawning the {0} threads.".format(THREAD_LIMIT)
    for x in xrange(THREAD_LIMIT):
        print "Thread {0} started.".format(x)
        # This is the thread class that we instantiate.
        workerbee().start()

    # Put stuff in queue
    print "Putting stuff in queue"
    for i in inputlist:
        # Block if queue is full, and wait 5 seconds. After 5s raise Queue Full error.
        try:
            jobs.put(i, block=True, timeout=5)
        except:
            singlelock.acquire()
            print "The queue is full !"
            singlelock.release()

    # Wait for the threads to finish
    singlelock.acquire()        # Acquire the lock so we can print
    print "Waiting for threads to finish."
    singlelock.release()        # Release the lock
    jobs.join()                 # This command waits for all threads to finish.

#
# Main thread class - based on threading.Thread
# This class is cloned/used as a thread template to spawn those threads.
# The class has a run function that gets a job out of the jobs queue
# And lets the queue object know when it has finished.
#
class workerbee(threading.Thread):
    def run(self):
        # run forever
        while 1:
            # Try and get a job out of the queue
            try:
                job = jobs.get(True,1)
                singlelock.acquire()        # Acquire the lock
                print "Multiplication of {0} with {1} gives {2}".format(job[0],job[1],(job[0]*job[1]))
                singlelock.release()        # Release the lock
                # Let the queue know the job is finished.
                jobs.task_done()
            except:
                break           # No more jobs in the queue

#
# Executes if the program is started normally, not if imported
#
if __name__ == '__main__':
    # Call the mainfunction that sets up threading.
    draadje(inputlist_ori)

Sigh. I just finished adding spaces to show where a def ends, and the damn code highlighter removed it again. Grrrrrr. If you want a copy of the code, let me know and I’ll update this post with a zipped copy of it.

Update: just discovered the “Syntaxhighlighter evolved” plugin and updated the code – indentation now works !!!

Not using regular expressions (re or regex) to find a #hashtag (python).

First, a quick reminder for myself: there’s an extremely good guide to regex on Andrew M. Kuchling’s pages.

Secondly, you don’t really *need* regex to parse for hashtags in a tweet – it’s a bit of overkill. The following code will do as well, and was written in 1 minute after searching 15 minutes in regex how to make certain to include hyphens ( – ) and other non-characters if they are put into the hashtag.

The regular expression that I find works quite well for all hashtags that don’t have a hyphen in it:

>>> hashtag = "This is a #hashtag #test-link #a should#not#work"
>>> x = re.compile(r'\B#\w+')
>>> x.findall(hashtag)
['#hashtag', '#test', '#a']

So the above code correctly finds all words beginning with a hashtage, and not the ones that contain a hashtag inside the word. Note that the hyphen and the word after it is not included.

This is the short code I wrote that does all I want:

>>> hashtag = "This is a #hashtag #test-link #a should#not#work"
>>> for word in hashtag.split():
	if word[0] == "#":
		print word		
#hashtag
#test-link
#a

In section 6 of the above-mentioned guide, Andrew states that in some cases string methods (like split) are faster than using regex. For simplicity, I’m going to use the latter code.

Update: Grrr – discovered that the tweets I am processing are in html so have href tags around them – which means ofcourse that there are no blanks for me to split words in. After another unsuccessful session with regex and just to continue I’ve used the BeautifulSoup html parsing library to get around that by stripping out all tags and then splitting the sentence up again. Probably not as efficient as immediately using regex, I’ll have to revisit this in the future.