Subredditor

Over Thanksgiving vacation I decided I wanted to see how various subreddits were connected and what their relative sizes were. Some projects seemed to tackle this goal, but I didn’t like the interface or how old their datasets were so I started to build my own. I began by building a scraper that would look at a specific subreddit, find the related subreddits section (for this I used PRAW), parse the section for the subreddit links, and build a map of the connections.  The application would visit each subreddit referenced until all known subreddits were visited.  The code for the crawler is here: https://github.com/cdated/reddit-crawler

Since crawling the entire site (with rate-limiting) took a couple of days I eventually updated the crawler to insert the additions into MongoDB. This ensured progress would not be lost if the application crashed or the internet connection was interrupted. Once the dataset was generated I wanted to make an interactive graph anyone could access on the internet. So first I needed a simple web server that would accept a few parameter; subreddit, graph depth, and nsfw. Without much trouble I got a flask server to return a static graph image using Python’s graphviz library. I had a little experience with Heroku so decided to put my current work up there.

Having a public interface to my project, I was emboldened to improve the usability and wanted to try out D3.js.  From the D3.js homepage I found an interactive graph example that would suit my needs. After altering the graph data to match the D3.js format I was able to get what I wanted working in JavaScript. This opened a lot of options for me to make the nodes draggable, turn the nodes into links, change the size of the nodes to represent the subscriber counts, and dynamically color the nodes and links to make graph look more attractive.

subredditor sample graph

I still have a lot of changes I want to make to the project when I have time. The database currently uses MongoLab’s free tier which makes the deployment a lot slower than my development environment. I eventually want to update the crawler to use Postgresql’s hstore then I can leverage Heroku’s Posgresql support. Likewise, while deploying in Heroku is very convenient it also imposes many constraints. Migrating to a VPS would force me to work at all levels of the deployment.

The code: https://github.com/cdated/subredditor

Live instance: http://subredditor.com

Tiny Code

I just found out about this neat little subreddit,  http://www.reddit.com/r/tinycode, and thought I should spread the word.  The idea is to share neat snippets of code that do impressive things with a focus on minimalism.  One thing I especially like is that the site isn’t merely code golf and obfuscation, but instead powerful solutions in relatively few lines of code.  The major benefit of keeping things small is that you’ve boiled a problem down to as few operations as possible.

As a matter of finding learning material, you usually only want the part of the solution you don’t yet understand and not a full-blown application with features currently over your head.  After all, when trying to work out the tiny but complicated and time-consuming portions of a project I’ll search through blog posts and Stack Overflow submissions until I piece together to a solution suitable for my implementation.  If a snippet is too long it’s not likely it will work well in a general case.  If it’s too complicated to be readily understood it’ll get put on the back-burner until I find a better solution or give up and start digesting it.

Here’s a submission that won’t take long to wrap your head around: Four lines of Python for a Spellchecker

import sys
a = sys.argv[1:]
s = [x.lower()[:-1] for x in open("/usr/share/dict/words")]
print " ".join([("\x1b[31m"+w+"\x1b[0m" if w in [_ for _ in a if _ not in s] else w) for w in a])

$./spellchecker.py hello this woard is wrong
hello this woard is wrong

I will admit code golf can be pretty cool, this little gem emerged in the comments:

print(' '.join('! '[w+'\n'in open("/usr/share/dict/words")]+w for w in __import__('sys').argv[1:]))

$./spellchecker.py hello this woard is wrong
hello this !woard is wrong

As for actually creating a more sophisticated spell checker, I recommend Peter Novig’s write-up norvig.com/spell-correct.html, and coursera.org‘s NLP class. As an added benefit I learned about the “words” package, which is nice because I always forget where I store my one time use dictionary files.

PyGTK TreeView Rubber Banding

Problem:

Enable multiple selections in a GTK TreeView using the mouse (click and drag) to select the desired nodes/rows.

Solution:

For the most part this is a very simple and straightforward task; however, the first time I tried to do it I found the number of examples to be a bit sparse.  First off I’ll be using the TreeView widget example from pygtk.org.  If you are new to TreeViews you should read the official documentation first.

The following file can be downloaded from here.

#!/usr/bin/env python

# example basictreeview.py

import pygtk
pygtk.require('2.0')
import gtk

class BasicTreeViewExample:

# close the window and quit
def delete_event(self, widget, event, data=None):
gtk.main_quit()
return False

def __init__(self):
# Create a new window
self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)

self.window.set_title("Basic TreeView Example")

self.window.set_size_request(200, 200)

self.window.connect("delete_event", self.delete_event)

# create a TreeStore with one string column to use as the model
self.treestore = gtk.TreeStore(str)

# we'll add some data now - 4 rows with 3 child rows each
for parent in range(4):
piter = self.treestore.append(None, ['parent %i' % parent])
for child in range(3):
self.treestore.append(piter, ['child %i of parent %i' %
(child, parent)])

# create the TreeView using treestore
self.treeview = gtk.TreeView(self.treestore)

# create the TreeViewColumn to display the data
self.tvcolumn = gtk.TreeViewColumn('Column 0')

# add tvcolumn to treeview
self.treeview.append_column(self.tvcolumn)

# create a CellRendererText to render the data
self.cell = gtk.CellRendererText()

# add the cell to the tvcolumn and allow it to expand
self.tvcolumn.pack_start(self.cell, True)

# set the cell "text" attribute to column 0 - retrieve text
# from that column in treestore
self.tvcolumn.add_attribute(self.cell, 'text', 0)

# make it searchable
self.treeview.set_search_column(0)

# Allow sorting on the column
self.tvcolumn.set_sort_column_id(0)

# Allow drag and drop reordering of rows
self.treeview.set_reorderable(True)

self.window.add(self.treeview)

self.window.show_all()

def main():
gtk.main()

if __name__ == "__main__":
tvexample = BasicTreeViewExample()
main()

When run, the result will look like this:

At this point clicking and dragging you mouse will only move nodes around and will not allow multiple selection.

In order to do rubber band selection you must:

1) Set the “rubber_banding” attribute to be true for the TreeView.

2) Get the TreeView’s TreeSelection object, and set the mode to gtk.SELECTION_MULTIPLE.

self.treeview.set_rubber_banding(True)
self.treeview_selection = self.treeview.get_selection()
self.treeview_selection.set_mode(gtk.SELECTION_MULTIPLE)

Adding the 3 lines above to the example code at line 39 will allow rubber band selection of a TreeStore:


The selection works just as well if the TreeView’s model is a GTK ListStore instead of a TreeStore.  The changes necessary to use a ListStore have been highlighted.

#!/usr/bin/env python

# example basictreeview.py

import pygtk
pygtk.require('2.0')
import gtk

class BasicTreeViewExample:

# close the window and quit
def delete_event(self, widget, event, data=None):
gtk.main_quit()
return False

def __init__(self):
# Create a new window
self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)

self.window.set_title("Basic TreeView Example")

self.window.set_size_request(200, 200)

self.window.connect("delete_event", self.delete_event)

# create a TreeStore with one string column to use as the model
self.liststore = gtk.ListStore(str)

# create the TreeView using liststore
self.treeview = gtk.TreeView(self.liststore)

# add rows to the liststore
for i in range(8):
iter = self.liststore.append()
self.liststore.set (iter, 0, "item " + str(i))

# add rubber-banding
self.treeview.set_rubber_banding(True)
self.treeview_selection = self.treeview.get_selection()
self.treeview_selection.set_mode(gtk.SELECTION_MULTIPLE)

# create the TreeViewColumn to display the data
self.tvcolumn = gtk.TreeViewColumn('Column 0')

# add tvcolumn to treeview
self.treeview.append_column(self.tvcolumn)

# create a CellRendererText to render the data
self.cell = gtk.CellRendererText()

# add the cell to the tvcolumn and allow it to expand
self.tvcolumn.pack_start(self.cell, True)

# set the cell "text" attribute to column 0 - retrieve text
# from that column in treestore
self.tvcolumn.add_attribute(self.cell, 'text', 0)

# make it searchable
self.treeview.set_search_column(0)

# Allow sorting on the column
self.tvcolumn.set_sort_column_id(0)

# Allow drag and drop reordering of rows
self.treeview.set_reorderable(True)

self.window.add(self.treeview)

self.window.show_all()

def main():
gtk.main()

if __name__ == "__main__":
tvexample = BasicTreeViewExample()
main()

I hope this helps, enjoy your newfound functionality of drawing shaded rectangle over your TreeView!