Subredditor

Over Thanksgiving vacation I decided I wanted to see how various subreddits were connected and what their relative sizes were. Some projects seemed to tackle this goal, but I didn’t like the interface or how old their datasets were so I started to build my own. I began by building a scraper that would look at a specific subreddit, find the related subreddits section (for this I used PRAW), parse the section for the subreddit links, and build a map of the connections.  The application would visit each subreddit referenced until all known subreddits were visited.  The code for the crawler is here: https://github.com/cdated/reddit-crawler

Since crawling the entire site (with rate-limiting) took a couple of days I eventually updated the crawler to insert the additions into MongoDB. This ensured progress would not be lost if the application crashed or the internet connection was interrupted. Once the dataset was generated I wanted to make an interactive graph anyone could access on the internet. So first I needed a simple web server that would accept a few parameter; subreddit, graph depth, and nsfw. Without much trouble I got a flask server to return a static graph image using Python’s graphviz library. I had a little experience with Heroku so decided to put my current work up there.

Having a public interface to my project, I was emboldened to improve the usability and wanted to try out D3.js.  From the D3.js homepage I found an interactive graph example that would suit my needs. After altering the graph data to match the D3.js format I was able to get what I wanted working in JavaScript. This opened a lot of options for me to make the nodes draggable, turn the nodes into links, change the size of the nodes to represent the subscriber counts, and dynamically color the nodes and links to make graph look more attractive.

subredditor sample graph

I still have a lot of changes I want to make to the project when I have time. The database currently uses MongoLab’s free tier which makes the deployment a lot slower than my development environment. I eventually want to update the crawler to use Postgresql’s hstore then I can leverage Heroku’s Posgresql support. Likewise, while deploying in Heroku is very convenient it also imposes many constraints. Migrating to a VPS would force me to work at all levels of the deployment.

The code: https://github.com/cdated/subredditor

Live instance: http://subredditor.com

Drawing Lambdas

The other night I was watching a video on the lambda operator in Ruby 1.9.1, which can be spelled out or represented as ‘->’.  Coming from a C++ background and having dabbled in Scheme and Haskell ‘\’, I felt a little uncomfortable with this syntax.  So my first thought was that it’d be nice if you could knock off all of this ASCII tomfoolery and just use Unicode lambdas in code.  This leads to two problems though:

  • Unicode friendly keyboards don’t really exist
  • Source code tends to be strictly in ASCII

Typing Unicode

Let’s deal with the Unicode keyboard problem first…

One method to get lambdas into your code is to use digraphs in Vim.  To get a full list in the editor use the command ‘:digraph’, this will draw a matrix of every combination available.  As depicted in the screenshot below (scrolled down to Greek symbols) each column has the two-key combination, the symbol, and the Unicode decimal representation.  To insert a symbol use Ctrl-k and the two-key combination.  For example,  lambda is Ctrl-k l*.

What’s really nice about the digraphs table is that many combinations are easy to figure out.  For example, all Geek symbols are the Ctrl-k prefix and the letter followed by an asterisk.

Some are even obvious:

Diacritic Example Digraph
macron ē (letter + minus)
trema/umlaut ü (letter + colon)
cédille ç (letter + comma)
circumflex û (letter + caret)

Unicode in Action

Now all this is fun to play with a bit, but your compiler/interpreter will likely be upset with your latest additions. The few exceptions will be languages that are very friendly to Unicode strings like Go and Python.

hello_world_gr.go

package main

import fmt "fmt"

func main() {
    fmt.Printf("Γεια σας κόσμο!\n")
}

hello_world_gr.py

#!/usr/bin/env python

def main():
    print("Γεια σας κόσμο!\n")

if __name__ == '__main__':
    main()

However, when I said I wanted to see lambdas in my code I didn’t mean inside of string literals, I meant actual lambdas as keywords.  If I try to put a lambda symbol in place of the keyword ‘lambda’:

#!/usr/bin/env python

def main():
    # λ == lambda
    square = λ x : x * x
    nine = square(3)
    print(nine)

if __name__ == '__main__':
    main()

I will get the following error:

File "hello.py", line 5
square = λ x : x * x
^
SyntaxError: invalid syntax

Faking It

This leads me to my second point; you don’t really want lambdas (or any non-ASCII) in your code, but you may want to see lambdas in your editor.  In Vim, you could apply the conceal patch, which is standard in recent versions of Vim 7.3.  Conceal allows you to create syntax rules for drawing specific keywords and string matches as any symbol you like.

For instance, the Vim command:

:syn keyword Operator lambda conceal cchar=λ

will swap out the keyword lambda with the symbol λ, but not modify the text.  Moving the cursor over the lambda symbol will reveal the letters for editing.  Your compiler never has to know you were experimenting with Unicode.

.vimrc rules

There  are already many script implementations that do this kind of thing, but I  really like this solution for it’s ease of implementation.  I’ve got the following rules in my .vimrc, that I’m sure to tweak until I’m satisfied.

if has('conceal')
    if has('autocmd')
        autocmd Syntax * syn keyword Operator not conceal cchar=¬
        autocmd Syntax * syn keyword Operator lambda conceal cchar=λ
        autocmd Syntax ruby syn match rubyKeyword "->" conceal cchar=λ
        autocmd Syntax haskell syn match hsKeyword "\\" conceal cchar=λ
    endif
    hi! link Conceal Operator
    set conceallevel=2
endif

Essentially, there are four Vim syntax rules that don’t live in individual syntax files:

  • Draw not as ¬ for every file type.
  • Draw lambda as λ for every file type.
  • Draw -> as λ for all Ruby files.
  • Draw \ as λ for all Haskell files.

These rules are typically placed in syntax files, but I’m still test driving them and I don’t want to repeat the first two rules for every syntax file.  For the time being they can stay in my .vimrc where I can keep an eye on them.

Now Stretch

The whole point of going through these steps is attaining more flexibility.  Hopefully now you feel less restricted by your keyboard and your languages’ syntax.

If you feel Pascal’s not equal, <>, makes your skin crawl, you can change it with:

:syn match pascalSymbolOperator '<>' conceal cchar=≠

or to add to your .vimrc

autocmd Syntax pascal syn match pascalSymbolOperator '<>' conceal cchar=≠

For example:

while (a <> b) do WriteLn('Waiting');

is shown as

while (a ≠ b) do WriteLn('Waiting');

Everyone has completely different preferences; so if you don’t like something, change it.

The Wonderful World of tmux

tmux in my opinion is the best tool for interacting with the shell.  Having been a daily user of tmux for the last 2 years, I’ve put a lot of thought into customizing it to suit my needs.  As an ArchLinux user I found the tmux ArchWiki to be an excellent resource for picking up tips on using and customizing tmux.   My own .tmux.conf is available on github and for the remainder of this post I will be breaking my configuration down.

Preliminary (ditch ‘Caps Lock’)

This step isn’t necessary, but considering I use the Control key almost as often as the spacebar, I find it cuts down of the hand contorsion.  I learned from using Emacs that if I’m going to tap the Control key all day long it needs to be on the home row.  There’s usually an option in the keyboard settings for your desktop environment to change the behavior of Caps Lock to be another Control key.  I have the following in my .Xmodmap file which loads when X Windows starts:

remove Lock = Caps_Lock
keycode 0x42 = Control_L
add Control = Control_L

Prefix

The first thing that had to go was using Ctrl-b as the command key, the key combo that preceeds every tmux command.  Even with the Control key on the home row, I find my hand a bit stretched for a gesture I will have to make everytime I invoke tmux.  I suppose Ctrl-a and Ctrl-x were avoided because of GNU Screen and Emacs respectively, but I chose Ctrl-f since my left hand on the keyboard can make that combo nicely from a resting position.

# Change prefix key to Ctrl-f
unbind C-b
set -g prefix C-f

Next, prefix-d detaches the session by default. Considering my previous change I accidentally hit this combo too often, so I unbind(ed) it and type out the ‘detach’ command when I want to detach a session (which isn’t often).

# Remove shortcut for detach session
unbind d

Navigation

As you may know one of the features of tmux is to create numbered windows with prefix-c, which can be navigated forward and backward with prefix-n and prefix-p respectively.  By default tmux starts numbering windows at zero, which is inconvenient because zero and one are far away from one another on the keyboard.  So cycling between the between windows 0,1,2, and 3 feels unnatural.  I resolve this by setting the base index to one instead of zero.

# Start numbering at 1 intead of 0
set -g base-index 1

I also find myself switching back and forth between two windows frequently, which prefix-l can do nicely.  However, using another key seems unncessary when I can get the same effect by double-tapping the prefix command (for me Ctrl-f).

# Last active window
unbind l
bind C-f last-window

Likewise, prefix-& [Ctrl-f Shift-7] to kill a window seems very uncomfortable, which may be the idea to prevent accidentally closing windows.   Instead, I decided to use prefix-k for ergonomics.

# Kill window
bind C-k kill-window

Pane Management

To me the panes in the tmux are the most fun to work with, especially because they don’t require finicking with the mouse to line up code in windowed terminals.  Although the default mappings could have been much simpler.  My thinking is that a vertical bar means split vertically, and a horizontal bar means split horizontally.  So I cheat a little by using the ‘-‘ and ‘\’ keys to represent ‘_’ and ‘|’, which become my new horizontal/vertical split commands.

# More straight forward key bindings for splitting panes
unbind %
bind \ split-window -h
unbind '"'
bind - split-window -v

To move around in tmux you can use:

  • prefix-o, to cycle through panes
  • prefix-[up, down, left, right], to select by direction
  • prefix-q and enter the pane number, to go directly to the numbered pane

As for resizing the panes I went along with vim’s navigational keys to indicate directions to ‘push’ the panes.  Therefore, left (prefix-h), right (prefix-l), up (prefix-k), down (prefix-j).  To move 5 times the distance of the previous commands hold shift for each.

# Pane
# Make choosing the pane similar to vi navigation
set-option -g mouse-select-pane off
bind h resize-pane -L
bind l resize-pane -R
bind k resize-pane -U
bind j resize-pane -D

# Use the vi directions for resizing panes too
bind H resize-pane -L 5
bind L resize-pane -R 5
bind K resize-pane -U 5
bind J resize-pane -D 5

Copy and Paste

One of the things I found completely baffling in GNU Screen was how they decided to map the copy/paste functions.  tmux did a better job with open/close square brackets, but I decided to give up and use prefix-Ctrl-c and prefix-Ctrl-v.  Otherwise, I feel like copy/paste in tmux and screen are neglected because the defaults  don’t make much sense.  Also adding in a binding for xclip makes copy and pasting in tmux more practical.

# Copy mode
unbind [
bind C-c copy-mode

# Paste mode
unbind ]
bind C-v paste-buffer

# Move tmux copy buffer into x clipboard
bind-key C-y save-buffer /tmp/tmux-buffer \; run-shell "cat /tmp/tmux-buffer | xclip

As a vim user, I opted to have the mode keys follow the vi/vim conventions.  I suggest looking at the short table in the man page for tmux to see exactly what selecting vi or emacs for this option entails.  As for the mouse mode, tmux can acknowledge when the mouse buttons are depressed and allow for pane selection and resizing with the mouse.  I have this disabled though because I prefer keyboard navigation and it interferes with my selection/copy/paste operations too much.

# Use Vi mode
setw -g mode-keys vi
# Make mouse useful in copy mode
setw -g mode-mouse off

Environment

The following configurations determine what your tmux environment will look like, specifically conventions for text colors and window titles.

From my screenshot you can see the decisions I’ve made with regard to colors.  There are three panes open in the active window and there are four windows (each running a different application).  The active window is red and has an asterisk, the previous window’s title has a hyphen at the end.  tmux has the option to indicate activity on each of the windows from the toolbar, but I have this disabled because some applications constantly write to the shell.

# Status Bar
set -g status-bg black
set -g status-fg white
set -g status-interval 1
set -g status-left '#[fg=green]#H#[default]'
set -g status-right '#[fg=blue,bold]%m-%d-%y #[fg=red,bold]--#[fg=white,bold]%I:%M:%S#[fg=red,bold]--#[default]'

# Notifying if other windows has activities
setw -g monitor-activity off
set -g visual-activity off

# Highlighting the active window in status bar
setw -g window-status-current-bg red

Obviously, if you ever intend to use the clock in tmux you can set the color and mode.

# Clock
setw -g clock-mode-colour red
setw -g clock-mode-style 12

I always make sure my history has plenty of lines, since you can usually spare the memory and you can search through the history by entering copy mode and selecting ‘/’ (if mode-key is set to vi).

# History
set -g history-limit 100000

Lastly, remember to reload you .tmux.conf file after making changes.

# Reload the config file
bind r source-file ~/.tmux.conf

Conclusion

Every change I have made is a response to an itch I’ve experience when using a default tmux setting.   Being able to go into the config and get some relief is a beautiful thing.  Of course every change I outlined works great for me and not necessarily anyone else, so keep tweaking your conf until you find inner peace.

Once again, my .tmux.conf file can be view and downloaded here.

Just Do It

I have a terrible habit of grabbing a technical book, reading the first two chapters, and then putting it down to read the first two chapters of another technical book. I’ve tried to keep myself on task with several tools; tasks on Google calendar, chorewars.com, and now schooltraq.com.

My latest book to finish is MongoDB in Action.  I’m a few pages into chapter 3 and I’m up to my old tricks again.  The only thing I can think to do is just keep going, and maybe write about it a bit to remind myself not to give in.  So I don’t forget, here are some reasons why I need to finish this book:

  • I need to do a better job finishing what I start, and now is when I need to do it.
  • It’s about time I learn the virtues of NoSQL.
  • MongoDB has an excellent Python driver, it’s a shame not to use it.
  • SQLite (my goto small application database) isn’t designed for the web domain.
  • The book uses Ruby and Javascript in examples, which I want to give more attention.
  • I am starting a web inventory project and these topics need to be learned.

Well unless I can train a capuchin monkey to bite me when I try to start reading something else, I will have to be accountable for myself.  Time to get back to reading…

Tiny Code

I just found out about this neat little subreddit,  http://www.reddit.com/r/tinycode, and thought I should spread the word.  The idea is to share neat snippets of code that do impressive things with a focus on minimalism.  One thing I especially like is that the site isn’t merely code golf and obfuscation, but instead powerful solutions in relatively few lines of code.  The major benefit of keeping things small is that you’ve boiled a problem down to as few operations as possible.

As a matter of finding learning material, you usually only want the part of the solution you don’t yet understand and not a full-blown application with features currently over your head.  After all, when trying to work out the tiny but complicated and time-consuming portions of a project I’ll search through blog posts and Stack Overflow submissions until I piece together to a solution suitable for my implementation.  If a snippet is too long it’s not likely it will work well in a general case.  If it’s too complicated to be readily understood it’ll get put on the back-burner until I find a better solution or give up and start digesting it.

Here’s a submission that won’t take long to wrap your head around: Four lines of Python for a Spellchecker

import sys
a = sys.argv[1:]
s = [x.lower()[:-1] for x in open("/usr/share/dict/words")]
print " ".join([("\x1b[31m"+w+"\x1b[0m" if w in [_ for _ in a if _ not in s] else w) for w in a])

$./spellchecker.py hello this woard is wrong
hello this woard is wrong

I will admit code golf can be pretty cool, this little gem emerged in the comments:

print(' '.join('! '[w+'\n'in open("/usr/share/dict/words")]+w for w in __import__('sys').argv[1:]))

$./spellchecker.py hello this woard is wrong
hello this !woard is wrong

As for actually creating a more sophisticated spell checker, I recommend Peter Novig’s write-up norvig.com/spell-correct.html, and coursera.org‘s NLP class. As an added benefit I learned about the “words” package, which is nice because I always forget where I store my one time use dictionary files.

PyGTK TreeView Rubber Banding

Problem:

Enable multiple selections in a GTK TreeView using the mouse (click and drag) to select the desired nodes/rows.

Solution:

For the most part this is a very simple and straightforward task; however, the first time I tried to do it I found the number of examples to be a bit sparse.  First off I’ll be using the TreeView widget example from pygtk.org.  If you are new to TreeViews you should read the official documentation first.

The following file can be downloaded from here.

#!/usr/bin/env python

# example basictreeview.py

import pygtk
pygtk.require('2.0')
import gtk

class BasicTreeViewExample:

# close the window and quit
def delete_event(self, widget, event, data=None):
gtk.main_quit()
return False

def __init__(self):
# Create a new window
self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)

self.window.set_title("Basic TreeView Example")

self.window.set_size_request(200, 200)

self.window.connect("delete_event", self.delete_event)

# create a TreeStore with one string column to use as the model
self.treestore = gtk.TreeStore(str)

# we'll add some data now - 4 rows with 3 child rows each
for parent in range(4):
piter = self.treestore.append(None, ['parent %i' % parent])
for child in range(3):
self.treestore.append(piter, ['child %i of parent %i' %
(child, parent)])

# create the TreeView using treestore
self.treeview = gtk.TreeView(self.treestore)

# create the TreeViewColumn to display the data
self.tvcolumn = gtk.TreeViewColumn('Column 0')

# add tvcolumn to treeview
self.treeview.append_column(self.tvcolumn)

# create a CellRendererText to render the data
self.cell = gtk.CellRendererText()

# add the cell to the tvcolumn and allow it to expand
self.tvcolumn.pack_start(self.cell, True)

# set the cell "text" attribute to column 0 - retrieve text
# from that column in treestore
self.tvcolumn.add_attribute(self.cell, 'text', 0)

# make it searchable
self.treeview.set_search_column(0)

# Allow sorting on the column
self.tvcolumn.set_sort_column_id(0)

# Allow drag and drop reordering of rows
self.treeview.set_reorderable(True)

self.window.add(self.treeview)

self.window.show_all()

def main():
gtk.main()

if __name__ == "__main__":
tvexample = BasicTreeViewExample()
main()

When run, the result will look like this:

At this point clicking and dragging you mouse will only move nodes around and will not allow multiple selection.

In order to do rubber band selection you must:

1) Set the “rubber_banding” attribute to be true for the TreeView.

2) Get the TreeView’s TreeSelection object, and set the mode to gtk.SELECTION_MULTIPLE.

self.treeview.set_rubber_banding(True)
self.treeview_selection = self.treeview.get_selection()
self.treeview_selection.set_mode(gtk.SELECTION_MULTIPLE)

Adding the 3 lines above to the example code at line 39 will allow rubber band selection of a TreeStore:


The selection works just as well if the TreeView’s model is a GTK ListStore instead of a TreeStore.  The changes necessary to use a ListStore have been highlighted.

#!/usr/bin/env python

# example basictreeview.py

import pygtk
pygtk.require('2.0')
import gtk

class BasicTreeViewExample:

# close the window and quit
def delete_event(self, widget, event, data=None):
gtk.main_quit()
return False

def __init__(self):
# Create a new window
self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)

self.window.set_title("Basic TreeView Example")

self.window.set_size_request(200, 200)

self.window.connect("delete_event", self.delete_event)

# create a TreeStore with one string column to use as the model
self.liststore = gtk.ListStore(str)

# create the TreeView using liststore
self.treeview = gtk.TreeView(self.liststore)

# add rows to the liststore
for i in range(8):
iter = self.liststore.append()
self.liststore.set (iter, 0, "item " + str(i))

# add rubber-banding
self.treeview.set_rubber_banding(True)
self.treeview_selection = self.treeview.get_selection()
self.treeview_selection.set_mode(gtk.SELECTION_MULTIPLE)

# create the TreeViewColumn to display the data
self.tvcolumn = gtk.TreeViewColumn('Column 0')

# add tvcolumn to treeview
self.treeview.append_column(self.tvcolumn)

# create a CellRendererText to render the data
self.cell = gtk.CellRendererText()

# add the cell to the tvcolumn and allow it to expand
self.tvcolumn.pack_start(self.cell, True)

# set the cell "text" attribute to column 0 - retrieve text
# from that column in treestore
self.tvcolumn.add_attribute(self.cell, 'text', 0)

# make it searchable
self.treeview.set_search_column(0)

# Allow sorting on the column
self.tvcolumn.set_sort_column_id(0)

# Allow drag and drop reordering of rows
self.treeview.set_reorderable(True)

self.window.add(self.treeview)

self.window.show_all()

def main():
gtk.main()

if __name__ == "__main__":
tvexample = BasicTreeViewExample()
main()

I hope this helps, enjoy your newfound functionality of drawing shaded rectangle over your TreeView!