Posted by
Tsukasa on January 3, 2010
It is time again for me to run a Python Golf challenge… The aim? To write a python program that solves a problem in the least number of bytes (of source code).
The first of a few problems is:
Given a list of words on stdin (one per line), find the words that have the largest number of anagrams in that list.
Print all of the words that have the meet the criteria of having the largest number anagrams (One per line, in alphabetical order).
This competition has now finished. The winner was Nick Cooper at 103 bytes, with the following awesome solution:
import sys
s=sorted
o=s(sys.stdin)
r=map(s,o)
d=map(r.count,r)
for e,t in zip(o,d):print e*(max(d)==t),
Input:
caret
crate
react
trace
ester
reset
steer
terse
organ
groan
Output:
caret
crate
ester
react
reset
steer
terse
trace
Posted by
Tsukasa on April 16, 2009
Have you ever wondered how OSX decides what email address to use when you email a contact? A lot of the time, you get to choose, but there are times when the system decides for itself… Such as when you are using iCal to send out event invitations.
There is actually a way that the system decides on which address to pick, and this is even a public API. The problem though, is that Address Book does not expose this interface to the user.
So, after some researching (to find the actual way this is implemented), I developed a small (command line) tool to change the default email address for a user. This tool, takes the form of a small python program, using the pyobjc bindings to the AddressBook framework.
The source code is avaliable from my repository in the stuff repository under the osx/addressbook/edit_default_address.py path.
You will also find in the same folder: a script to create a mutt alias file from the system address book, and a script to convert all of the phone numbers into the international format (from the Australian format).
These three programs form the basis of a good set of examples of the python objective-c bridge for the AddressBook format.
Posted by
Tsukasa on March 9, 2009
As some people know, I read a lot of RSS feeds. I am currently subscribed to 69 separate feeds, and this number is slowly growing.
Something that I am starting to notice, is that quite a few websites/blogs don’t give you the whole story within the RSS feed. This is quite annoying, I read a lot of things while I am mobile (read with limited Internet connectivity), so clicking on each article’s link before I leave for the train is just annoying.
I have finally gotten around to creating a (web) service that will take these said feeds, and ‘fix’ them. To do this, I take the RSS feed that is provided by the website/blog, and grab the full page that is linked from it. Then I run a xpath query over this page and dump the result of it into the the output RSS feed.
You can grab the latest version of the service from it’s development repository at http://tsukasa.net.au/~hg/feed_fixer, or you may use my installed version at http://tsukasa.net.au/feed_fixer.
Remember that I am actually using this service, so DO NOT delete any of the entries that are there.
Some features that will be coming in a future version will be the ability to have ‘hidden’ feeds (with a special key to gain access to the feed), the ability to add feeds requiring a password, the ability to add password protected trac feeds (as they decided to require user’s to have a cookie). Feature requests are welcome.
Posted by
Tsukasa on January 5, 2009
Well, here we have yet another challenge… Though this one does not involve any prizes, as I want my students at the NCSS camp to actually do their projects and not spend all their time on my competition.
So, what is has to be done here? Well it is quite simple really… Design a bot to play snake for you. I am forcing people who submit to write their solution in python (though there is no real need, other than I am teaching python at the moment). The input/output of the program is quite simple… The output consists of one of the following four letters: ‘U’, ‘D’, ‘L’, ‘R’ (corresponding to Up, Down, Left and Right).
The input consists of the following:
width height snake_id
board layout
An example is:
7 7 A
..*....
..A....
..a....
..a....
.......
.......
..Bbbb.
Where * is an apple, . is an empty cell, the uppercase letters are the head’s of the snakes, with the lowercase being the body.
Code for the engine is located in my repository under the snake project. Submissions should be made to my email address, or in person.
The initial competition will have bots go up against each other, with no time limit given per move. The later rounds will ensure that all bots get equal cpu time (by running the faster bots more often).
Posted by
Tsukasa on November 29, 2008
Well the deadline has come and passed, and we have a clear winner. That winner is Katie. Her solution comes in at a tiny 121 characters, with the closest solution coming from Tim with 187 characters.
The main difference came down to the fact that Katie decided to avoid using the regular expression that was given in the sample, and just parse the lines with str.split.
So without much more talking, here is the winning solution.
import sys
s=sum(([l[0]]*int(l[-1])for l in map(str.split,sys.stdin)if l[-1]!='-'),[])
for x in set(s):print x,s.count(x)
It is also interesting to note how she uses the sum function to append a set of lists together.
Posted by
Tsukasa on November 23, 2008
I have written a few entries on python golf before, but I have now decided to make an official competition from it. The rules are fairly simple: I pose a problem simple problem which must be solved (in python) in the lowest number of characters (where a new line counts as one byte). Solutions may be written for any 2.x version of python (ie, 2.3, 2.4, 2.5 or 2.6), and may use any library found on default install on a Debian machine.
Solutions must be emailed to me. In the case where two people have the same character count, the solution that arrived in my inbox first will be declared the winner (this is to stop people from playing with the date header in the email ^^). The winner will receive a chocolate bar or coffee — their choice.
The problem this week is one of parsing log files. You must parse a log file in the common log format that will be given on to your program on stdin. You must then print on stdout the amount of data that was sent to each ip address (and the ip address). A sample program has been provided:
import re, sys
def main():
clf_regexp = re.compile(r'''^(\S+)\s(\S+)\s(\S+)\s\[([^\]]*)\]\s"([^"]*)"\s(\d*)\s(\d*)$''')
mapping = {}
for line in sys.stdin:
m = clf_regexp.match(line)
if not m:
continue
ip, _, _, _, _, _, size = m.groups()
size = int(size)
if ip not in mapping:
mapping[ip] = 0
mapping[ip] += size
for ip in mapping:
print '%s %d' % (ip, mapping[ip])
if __name__ == "__main__":
main()
Solutions will be accepted until 11:59:59pm(EDT) on Friday 28th November 2008.
Posted by
Tsukasa on October 7, 2008
While I was meant to be working on my thesis, I decided to update my little script to grab images from the built in iSight camera. The older version depended upon a instance of procfs to be started from within a login shell (which is not always possible).
To get around this, I stared looking for a way to inject a process into a particular mach bootstrap session (After reading the OSX internals book, I knew this was where I should be looking). Now, I will not claim that this script is the best thing I have written… It requires you to set up a rule in the sudoers file to allow the _www user to execute the specified sudo command (as root). I will leave writing this line as an exercise to the user (I have written one version, it is just not completely secure). You will also have to grab the isightcapture binary from off the net, and update the script with the correct location.
function pgrep {
ps -A -o pid=,command= | grep "$1" | awk '{ print $1; }' | grep -v $$
}
function cleanup {
if [ ! -z "$MYTEMP" ]; then
rm -rf "$MYTEMP"
fi
}
MYTEMP="$(mktemp -d)"
trap "cleanup" 15 0
LOGIN_WINDOW_PID="$(pgrep loginwindow.app)"
OUTPUT_FILENAME="${MYTEMP}/isightcapture.jpg"
sudo launchctl bsexec "${LOGIN_WINDOW_PID}" /Users/gregdarke/bin/isightcapture -t jpg "${OUTPUT_FILENAME}"
echo -en 'Content-type: image/jpeg\r\n\r\n'
cat "${OUTPUT_FILENAME}"
Posted by
Tsukasa on October 6, 2008
One thing that most programmers will cringe at, is the thought of placing a database into a version control repository such as subversion or mercurial. Now I know that many of have done this for various reasons (I know I am guilty of it myself).
The point of this post is to show how this can be made a little nicer under mercurial using encode/decode filters. With a carefully constructed set of filters, you are able to actually perform text diffs and make sane merges between repositories. All you have to do is drop the following into your hgrc file (either the one in your project or ~/.hgrc):
[encode]
data.db = tempfile: sqlite3 INFILE .dump > OUTFILE
[decode]
data.db = tempfile: sqlite3 OUTFILE '.read INFILE'
This requires that you have the sqlite3 binary installed, otherwise you will end up with a data.db file containing the raw sql used to generate the database.
Posted by
Tsukasa on September 26, 2008
Now,
I have been thinking about this for a few days now… Is it possible to create a tuple in python that refers to itself. I don’t mean via some other object, so the following does not count:
def recursive():
l = []
t = (l,)
l.append(t)
return t
I believe it is possible to do directly from C, but I can not think of a way to do it from within python.
The idea of creating a recursive tuple was spawned from this little comment I found in the pickle source code: “… recursive tuples are a rare thing”. At first I thought it was talking about a tuple that directly refers to itself, but then figured that it must be talking about tuple’s that indirectly refer to themselves.
Posted by
Tsukasa on September 18, 2008
One of the most common complains I hear from people about python is that whitespace is significant. I would have to disagree with them, I think that significant whitespace is an excellent feature in a programming language. Though only if it is implemented correctly, which I believe it is not in python.
My major complaint about python is that it allows users to mix both tabs and spaces, and also different amounts of spaces to mark a code block. I believe that python should not allow this. I think a there should be one consistent method of indenting used within a file. I would love if this method was tabs, but I don’t really mind that much if it spaces.
This problem usually occurs when you have multiple people modifying code, over an extended period of time. Each person has their own preferred style of indenting code, and will use it (generally without thinking about it).
A friend of mine recently came across this exact problem, and in an attempt to make the file “sane” used a regular expression to fix the whitespace in the file. This ended up being a major problem, as their had inadvertently changed the meaning of the code, by subtle changing the indentation levels of the code. I came up with the idea of parsing the python program, then outputting the program back from the parse tree (thus ensuring the meaning of the code is unchanged).
After being shown the compiler module, I started playing around with some code to reproduce code from the abstract parse tree that was provided to me. When I went looking for documentation, I found that there is a piece of sample code provided with the python source that does exactly what I was after (After a few bug fixes). This example is called unparse.py (located in the Demo/parser/ directory of the python 2.5.x source code).
I suggest anybody who is trying to fix inconsistent indentation in a python file to look at this program. There are a few things to note though:
- The code must already work as wanted
- This program will strip all comments from the program
- The code will loose all of it’s layout – that is, code that may have been split over multiple lines, will now be over one line