Python3, PyMARC, Unicode & File Opening

Subtle error. There is some serious underlying computing issue here … encoding.

You need to make sure you are opening the file in the correct mode in Python3. In Python2 it didn’t really matter except for the line endings.

For PyMARC you want to open the file in binary mode(open(‘filename.mrc’, ‘rb’)so you get bytes out of the file handle. Not characters (class <str>).

reader = MARCReader(open("CanadaGovt.mrc", 'rb'))
count = 0
for record in reader:
    count += 1

For more you can keep going… but stop here and it should just work. Keep going if you want an explanation or skip to the end for a link to a really good presentation that explains the issue.

This is a bit strange and has to deal with how Python3 handles unicode. Basically we want to get the raw binary out of the file.

The MARCReader chunks each bit of the transmission file and puts that binary chunk into the Record class.

Inside Record which you’ve actually opened up a bit that binary chunk is then ‘decoded’ into strings or in python3: character strings which support unicode. To back up here. There are two “string” like classes in python3: one is “class <str>” and is a sequence of unicode characters ( The other is “bytes” which corresponds to the ( the sequence of single bytes (integers in the range of 0-255; which corresponds to ASCII). When you read a file in ‘rb’ mode you get the bytes out. If you just do ’r’ the language gets characters out; so if the reader is dealing with unicode text and you are getting sequences of bytes recognizing characters. For example the ‘PIG’ emoji is a actually four bytes long (b’\xf0\x9f\x90\x96’)

If you do the following in a python3 interpreter I hope it helps:

>>> import unicodedata
>>> s = unicodedata.lookup("PIG")
>>> s
>>> s.encode('utf8')

Python 2 it is much more complicated because in Python 2 there are two string-like classes; but it is reversed. In Python 2 “class <str>” is a sequence of bytes and doesn’t know anything about the characters inside of it. While there is a “class <unicode>” which is a sequence of unicode characters, like Python3’s “class <str>”. To make things even more complicated Python 2 lets you do operations which coerce the two types; changing something like ‘hello’ + u’ world’ into u’hello world’; which is something python 3 doesn’t allow.

Python 2

Python 2.7.6 (default, Sep 9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> u = u'Hello' + ' World!'
>>> u
u'Hello World!'

Python 3:

Python 3.4.3 (default, Feb 25 2015, 21:28:45) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> u = 'Hello' + b' World!'
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: Can't convert 'bytes' object to str implicitly

PS: there is a really good presentation from PyCon at: and deals with a lot of what I explained above. For extra bonus points eyeing the pymarc module for how it handles the Unicode sandwich would be a really good illustration.

Waving a Dead Fish

I’ve been using Vagrant & Virtualbox for development on my OS X machines for my solo projects. But in an effort to get an intern started up on developing a front-end to a project I started a while ago I ran into a really strange problem getting Vagrant working on Windows.

So as a tale of caution for whatever robot wants to pick up this bleg.

Bootcamp partition on a Mid-2010 MacBook Pro. Running a dormant OS X and a full Windows 7. The Windows 7 is the main environment:

Use the git bash shell since it has SSH to stand up the boxes with vagrant init, vagrant up.

And then stuck (similar to Vagrant stuck connection timeout retrying):

==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
    default: Adapter 2: hostonly
==> default: Forwarding ports...
    default: 22 => 2222 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address:
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...
    default: Error: Connection timeout. Retrying...

Well we booted into the VM with a head and it looked like the booting got interrupted by some sort of kernal panic due to:

Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.

Ok makes sense…the machine isn’t booting up and there has to be a reason why.

Long story short. The Windows 7 partition didn’t have virtualization enabled, and there is no BIOS setting or switch somewhere to do it. So what do you do:

How to enable hardware virtualization on a MacBook?

Like waving a dead fish in front of your computer.

  • Boot into OSX.
  • System Preferences > Select the Start Up preference pane
  • Select the Boot Camp partition with Windows
  • Restart into the Boot Camp partition
  • Magic

Go figure

org-mode ctrl-a & ctrl-e

I had been using a customized ctrl-a and ctrl-e (beginning-of-line and end-of-line) in my Emacs.

(defun smart-beginning-of-line ()
"Move point to first non-whitespace character or beginning-of-line.

If point was already at that position, move point to beginning of line."
(let ((oldpos (point)))
(and (= oldpos (point))

Those of you who are OS X users: are basic Emacs keybindings which out of the box bound in a similar way.

Org-mode has been my note taking, todo list, and everything for a while. But one thing has been that the keybindings haven’t quite been right. Instead of going to the logical beginning of a heading (the text)

* Tasks

Should go to the beginning of the T in Tasks, in Org-mode the cursor would jump to the systematic beginning of the line. Uhuh, that makes sense but it isn’t what my brain _really_ wants.

Thus the amaziness of org and having a setting for everything


Which smartly moves the cursor to where it should belong.

Thus the doctoring speaketh:

org-special-ctrl-a/e is a variable defined in `org.el'.
Its value is t
Original value was nil

Non-nil means `C-a' and `C-e' behave specially in headlines and items.

When t, `C-a' will bring back the cursor to the beginning of the
headline text, i.e. after the stars and after a possible TODO
keyword. In an item, this will be the position after bullet and
check-box, if any. When the cursor is already at that position,
another `C-a' will bring it to the beginning of the line.

`C-e' will jump to the end of the headline, ignoring the presence
of tags in the headline. A second `C-e' will then jump to the
true end of the line, after any tags. This also means that, when
this variable is non-nil, `C-e' also will never jump beyond the
end of the heading of a folded section, i.e. not after the

When set to the symbol `reversed', the first `C-a' or `C-e' works
normally, going to the true line boundary first. Only a directly
following, identical keypress will bring the cursor to the
special positions.

This may also be a cons cell where the behavior for `C-a' and
`C-e' is set separately.

You can customize this variable.


Gotcha: sRGB, Emacs 24, themes

I’ve been working with the Solarized color theme in my Emacs for a while. The homebrew recipe for Emacs has an option to pull in a patch which corrects the Cocoa port for Emacs to handle srgb colors correctly. But for the longest time I couldn’t get the colors to exactly line up to the references.

But I finally figured out that the theme was expecting a variable to be set:

(setq solarized-broken-srgb nil)

From the customize information:

Emacs bug #8402 results in incorrect color handling on Macs. If this is t (the default on Macs), Solarized works around it with alternative colors. However, these colors are not totally portable, so you may be able to edit the “Gen RGB” column in solarized-definitions.el to improve them further.

The gotcha is that if you set this through customize, generally the default custom.el loads after init.el with a lightly managed Emacs. So if you thought you were setting the variable in customize and it would work, you are wrong, since normally themes are loaded through your init.el, either through a separate library or directly in mine.

So for me to load solarized with correct srbg support:

(setq solarized-broken-srgb nil)
(load-theme 'solarized-dark t)

Installing Jekyll on OSX 10.9

Installing Jekyll

Recipe for installing RVM + Latest Stable Ruby + Jekyll on OS X 10.9. This is mostly so I can experiment with using GitHub pages to publish web sites. Loosely following instructions from GitHub how to set Jekyll up.

Install RVM + Latest Stable Ruby

RVM isn’t provided as a formula in homebrew/homebrew since RVM installs on a per user basis and it does some other non homebrew’y stuff. Depending on RVMs autolibs feature to install all its dependencies automatically using homebrew.

$ \curl -sSL | bash -s stable --ruby=

RVM will run and your shell should be set up to use the new Ruby.

Other options include using the system ruby as the default: and then invoking rvm to call the relevant ruby we want.

$ rvm --default use system
Now using system ruby.
Now using system ruby.
Warning! Executable 'ruby' missing, something went wrong with this ruby installation!
Warning! Executable 'gem' missing, something went wrong with this ruby installation!
Warning! Executable 'irb' missing, something went wrong with this ruby installation!
$ ruby -v
ruby 2.0.0p247 (2013-06-27 revision 41674) [universal.x86_64-darwin13]
$ which ruby

In that case I go back to the new installed ruby to isolate it from the system environment:

$ rvm use 2.1.0

Install Jeyll with RubyGems

Bundler is installed automatically by RVM, so dependencies should be installed!

$ gem install jekyll

It Lives

$ jekyll new newProject
New jekyll site installed in /Users/gugek/Desktop/newProject.
$ cd newProject/
$ jekyll serve
Configuration file: /Users/gugek/Desktop/newProject/_config.yml
            Source: /Users/gugek/Desktop/newProject
       Destination: /Users/gugek/Desktop/newProject/_site
      Generating... done.
    Server address:
  Server running... press ctrl-c to stop.


org-mode agenda

I’ve been using emacs’ org-mode to handle project and task tracking. There are a number of views in the agenda mode that weren’t clear to me what they did until I had to go back and see everything I’ve been doing for the last year:

‘a’ Agenda for current week or day

Week view of your (active) TODOs. With some options you can see archived and hiddent events and TODOs. The default brings in any TODO that has an active timestamp or is scheduled. After you bring it up you can then change the view to include everything from the last month, year, or arbitrary date. This is the view you need if you want to see completed tasks in an archive; using the ‘Log-All’ function when you have the view up along with adding the archive option if you are using an archive file.

‘L’ timeline

Timeline view of all date tagged items in the current org-mode buffer. Strangely, this view doesn’t respond to any of the agenda options, except for viewing things in logged format. You’d think it could give you an overview but it doesn’t.

‘t’ List of all TODOs

This is the list of active TODOs. A tasklist, which is configurable with a number of options to sort and surface the particular ones to the top.