Tuesday, August 23, 2016

Pair Programming Robots and having Fun!

I've had this idea for a lightweight pair programming app for a while now. One of the main inspirations is the iOS letterpress app. If you don't know the game it's a letter square, and you try to make the longest word you can from the letter square. Pretty straightforward stuff, except that you can play and challenge people all round the world. You take turns to make the longest word you can. What's really cool about the app is that you can have a little game with a total stranger. At least I assume you are actually playing with someone rather than against a robot. I'm not sure how drop outs are managed. When I think about it, playing against robots would be a more reliable experience, but I don't know the stats for drop outs. I've certainly dropped out of games, but perhaps I'm a sociopath and 99% of people carry on playing with each other till the end?

Anyway, these days there are lots of games where you compete and even team up with strangers, e.g. Splatoon, League of Legends (LoL) and so on. I'd love to learn more about how these games match people up to try and maximise the user experience as I think we have a related problem with matching people up for pairing sessions in our "Agile Development using Ruby on Rails" MOOC. If Splatoon/LoL are the gaming equivalent of full screenshare pairing, then simpler games like letterpress would correspond to a sort of micro-pairing experience on a small toy problem.
Ever since I've looked into the different styles of ping-pong pairing I've been fascinated how protocols like "one undermanship" have a game like feel. They remind me somehow of turn based games like chess. So I keep thinking of a letterpress style micro-pair programming game where you are involved in light-weight ping-pong pairing sessions with people all round the world.

Maybe there's nowhere near the number of people interested in pair programming as there are in playing word games, so maybe there would never be the critical mass to make it fun, ... unless, robots? I finally spent some time on a micro-pairing robot on a plane recently. There are a couple of challenges; one is working out the rules of this ping-pong pairing game I'm imagining, and another is getting a robot to pair program sensibly with you.

An instance of a pairing game might run like this:

Spec involves input/output, e.g. 2 => 4, 3 => 6, 4 => 8 (in this case a numeric doubler). The "one undermanship" protocol involves writing the absolute minimum amount of code (which we could perhaps measure in terms of characters? code-complexity?)

Pair A writes:

describe 'doubler' do

  it 'doubles a number' do
    expect(double(2)).to eq 4


Initial failing test. So Pair B works to write the absolute minimum code to make this test pass, e.g.

def double(number)

and then writes a new test that will fail

describe 'doubler' do

  it 'doubles a number' do
    expect(double(2)).to eq 4

  it 'doubles a number' do
    expect(double(3)).to eq 6


So pair A writes the minimum to make this pass, in this case being intentionally obtuse (they could write number*2 which would be fewer characters, but perhaps we can give them points for passing only the existing tests and not others?):

def double(number)
  4 if number == 2

then pair A adds another failing test:

describe 'doubler' do

  it 'doubles a number' do
    expect(double(2)).to eq 4

  it 'doubles a number' do
    expect(double(3)).to eq 6

  it 'doubles a number' do
    expect(double(4)).to eq 8


And finally pair B writes the general solution:

def double(number)
  number * 2

Of course pair B could write:

def double(number)
  4 if number == 2
  6 if number == 3

But it would be nice if we could somehow end on the more general case, with a stronger set of tests (for edge cases etc.? could build those into the initial input/outputs). The thing about making this an enjoyable game might be a scoring system? So that you get points for one-undermanship obtuseness to an extent, but that past a certain point there's a refactoring bonus. Maybe the sensible approach is to only score a single round of hard coding where the complexity is actually less than the general solution?

So there's also the issue of coming up with a range of simple coding problems that make this more interesting than the most trivial cases - I guess there's enough complexity in a few basic arithmetic problems, and we can collect more over time - there are great repositories like code wars. Anyway, with any multi-player game we have the classic bootstrap problem that if we had a great game that lots of people were playing, then there would be lots of people to play with and it would be great; but initially there are no people playing it. So in the meantime can we scaffold the gameplay with pretend people? Can we write a robot pairer than can make a test pass, and generate a new chunk of code to move the ping pong on?

For a restricted set of cases I think the answer is yes. At least what I started on the plane was a chunk of code that would take and analyse a ruby exception and write the necessary code to make it pass. It's not very complex at the moment, it's basically this:

def fix_code(e)
  if e.class == NoMethodError
    /undefined method \`(.*)\' for main\:Object/ =~ e.message
    eval "def #{$1}; 'robot method' ; end "
  elsif e.class == ArgumentError
    /wrong number of arguments \(given (.*), expected \d\)/ =~ e.message
    num_args = $1.to_i # could use class or arg to auto-infer an approprate name?
    arg_string = (0..num_args-1).map {|i| "arg#{i}"}.join(',') 
    /\(eval\)\:1\:in \`(.*)\'/ =~ e.backtrace.first
    method_name = $1
    eval "def #{method_name}(#{arg_string}); 'robot method' ; end"  
    puts "cannot handle error class #{e.class}; #{e.message}"


What it does at the moment is take NoMethodErrors and ArgumentErrors and fix things up so the specified method is created with the correct number of arguments. Assuming that the game involves working through a set of input/output values on a range of basic arithmetic problems I can imagine it being fairly easy to extend to make the majority of failing tests pass. Given an input/output pair, generating an RSpec test is pretty trivial. So a little more work here and one could have a basic ping pong pairing partner. I don't fool myself that it wouldn't break fairly quickly, but I think rounds of polishing could make it work reasonable well for a range of introductory problems. Would it create a fun game that many people would want to play? Probably not ... Might it be a good learning experience for some people? ... maybe? I think the process of stack-trace/error analysis is quite interesting and a nice feature would be to have the robot be able to explain why it does what it does - they would be canned explanations, but they could highlight how the stacktrace/error has been analysed in order to work out what to do next.

I guess the best initial interface would be to make it a command line game that you could play and the robot would edit the file that you are both working on perhaps? Having started it I'm kind of interested in extending it; we'll see if anyone else thinks this is anything other than mindless naval gazing :-)

Monday, July 18, 2016

ReArchitecting the AutoGrader

So on Friday I followed through with my plans to get the rest of the FeatureGrader to expose errors in the students’ code to students, rather than just having it respond with “your code timed out or had an error” and I think I was largely successful.

At least I got those few extra code changes deployed into production and my manual tests through the edX interface showed me that my test examples would display full errors for RSpec failures, migrations failures, and Rails failures. Of course I’m blogging before I’ve reviewed how things faired over the weekend, but it feels like a step in the right direction. Even if the students can’t understand the errors themselves, they can copy and paste the output and perhaps a TA has an increased chance of helping them.

I also wrapped my spike in tests like:

  Scenario: student submits a HW4 with migration_error
    Given I set up a test that requires internet connection
    Given an XQueue that has submission "hw4_migration_error.json" in queue
    And has been setup with the config file "conf.yml"
    Then I should receive a grade of "0" for my assignment
    And results should include 
"SQLException: duplicate column name: director: ALTER TABLE"

to check that the errors would actually be displayed even as we mutate the code. I have a selection of similar scenarios which feel like they are crying out to be DRYed out with a scenario template. Similarly, with these tests in place I wonder if I can’t improve some of the underlying grading code. Maybe we can re-throw these TestFailedError custom errors that look like they might have been intended for communicating submitted code errors back up to the grader. I found myself spending the time I could have been doing further refactoring reaching out to individual students on the forums and in GitHub to add some details about where the grader had been getting stuck for them, and encouraging them to re-submit since the grader had changed and they should now be able to see more details.

I just sneaked a peak at the GitHub comment thread, and while there are some new issues that could distract me from completing this blog, at the very least I can see some students deriving value from the new level of grader feedback. So grader refactoring? I continue to feel negative about that task. The nested sandboxes of the feature grader … the fear is that refactoring could open new cans of worms and it just feels like we miss a huge chance by not having students submit their solutions via pull request.

So how would a PR-based grader work? Well, reflecting on the git-immersion grader that we developed for the AV102 Managing Distributed Teams course, we can have students submit their GitHub usernames and have the grader grab details from GitHub. We can get a list of comments from a PR and so if we had code-climate, CI etc. set up on a repo and had students submit their solutions as pull-requests we could pull in relevant data using a combination of the repo name and their GitHub username.

Making pull-requests would require students to fork rather than clone repos as they were originally encouraged to do. Switching back to that should not be a big deal. I was keen to remove forking since it didn’t really contribute to the experience of the assignment and was just an additional hoop to jump through. However if submission is by PR then we want students to understand forking and pulling; and of course that’s a valuable real world skill.

This means all the solutions to the assignments exist in much larger numbers in GitHub repos, but they exist in a lot already, so not much change there. What we might have though is students submitting assignments through a process that’s worth learning, rather than an idiosyncratic one specific to edX and our custom auto graders.

With a CI system like Travis or Semaphore we can run custom scripts to achieve the necessary mutation grading and so forth; although setting that up might be a little involved. The most critical step however is some mechanism for checking that the students are making git commit step by step. Particularly since the solutions will be available in even greater numbers, what we need to ensure is that students are not just copying a complete solution verbatim and submitting in a single git commit. I am less concerned about the students ability to master an individual problem completely independently, and more concerned being able to follow a git process where they write small pieces of code step by step (googling when they get stuck) and commit each to git.

So for example in the Ruby-Intro assignment I imagine a step that checks that each individual method solution was submitted in a separate commit and that that commit comes from the student in question. Pairing is a concern there, but perhaps we can get the students set up so that the pairing session involves author and committer so that both are credited.

But basically we’d be checking that the first sum(arr) method was written and submitted in one commit, and then that max_2_sum(arr) was solved in a separate commit, and that the student in question was either the committer or the author on the assignment. In addition we would check that the commits were suitably spaced out in time, and of a recent date. The nature of the assignment changes here from being mainly focused on “can you solve this programming problem?”, to “can you solve this code versioning issue?”. And having the entire architecture based around industry standard CI might allow us to reliably change out the problems more frequently; something that feels challenging with the current grader architecture. The current grader architecture is set up to allow the publication of new assignments, but the process of doing so is understood by few. Maybe better documentation is the key there, although I think if there is a set of well tested assignments, then the temptation for many instructors and maintainers is just to use the existing tried and tested problems and focus their attention on other logistical aspects of a course.

Using existing CI systems effectively removes a large portion of the existing grader architecture, i.e. the complex sandboxing and forking of processes. This then removes a critical maintenance burden … which is provided reliably and free by the many available CI services (Travis, Semaphore, CodeShip etc.). Students now start to experience industry standard tools that will help them pass interviews and land jobs. The most serious criticism is the idea is that students won’t be trying to solve the problems themselves, but google any aspect of our assignments and find links like this. The danger of the arms race to keep solutions secret is that we burn all our resources on that, while preventing students from learning by reviewing different solutions to the problem.

I’m sure I’m highly biased but it feels to me that having students submitted a video of themselves pairing on the problem, along with a technical check to ensure they’ve submitted the right sorts of git commits will reap dividends in terms of students learning the process of coding. Ultimately the big win would be checking that the tests were written before the code, which could be checked by asking students to commit the failing tests, and then commit the code that makes them pass. Not ideal practice on the master branch but acceptable for pedagogical purposes perhaps … especially if we are checking for feature branches, and then even that those sets of commits are squashed onto master to ensure it always stays green …


I also reflect that it might be more efficient to be using web hooks on the GitHub Repos in question, rather than repeatedly querying the API (which is rate limited). We’d need our centralised autograder to be storing the data about all the student PRs so that we could ensure that the student’s submission was checked in a timely fashion.

Monday, July 11, 2016

Pairing - it's the Logistics Stupid!

Pair programming is one of the more controversial practices associated with Agile and Extreme programming. It inspires a full spectrum of emotions from love to hate. We’ve taken a new step in the current run of the “Agile Development using Ruby on Rails” MOOC (formerly known as Engineering Software as a Service), in making pair programming a requirement of getting a certificate. There are five programming assignments in the course and we’re giving half of each assignment grade on a pair programming component. Learners submit pair programming videos and are assessed by peer review. Each learner needs to review two other pair programming videos for relevance, pairing role rotation and communication.

The peer review rubric is based on the following four questions:
  1. were you programming on the relevant assignment? (1 point)
  2. did you have a pair partner? (9 points)
  3. did you rotate driver/navigator roles frequently? (approx evenly 4 times an hour) (40 points)
  4. did you communicate effectively? i.e. regular talking, discussing the task at hand (50 points)
The rubric has evolved through several rounds of testing. The points are somewhat arbitrary, and set as they are in order to have them add up to 100 and match the 100 points that are received from the correctness of the separately submitted programming assignment code. They’re not completely arbitrary of course; having a video that shows you and a partner actually working on the correct assignment is an important gateway, and we consider good rotation and communication of higher and similar importance.

There’s a conflict here of course between trying to prevent people gaming the system (submitting irrelevant or borrowed videos) and encouraging effective pair programming behaviour. The peer review process itself is not without controversy. We’ve softened the wording under rotation to indicate that roles do not need to be rotated precisely every 15 minutes, and increased the range of rotation points that can be assigned. The rotation rubric now looks like this:
Did the participants rotate driver/navigator roles frequently? Ideally at least once every 15 minutes, i.e. this means that roughly every 15 minutes the person who is typing stops and allows their partner to type for the next 15 minutes, while switching to an advisory “navigator” role. Check particularly the time-indexes submitted.
Don’t be too strict - the point is that the participants rotate roles approximately four times an hour - so for example a rotation at 13 mins, then after 17 mins then after 16 mins and then 14 mins is fine.
  • No (0 points) There was no driver navigator rotation, only a single person contributing code during the video
  • Not so much (10 points) There was only one driver/navigator rotation for hour of coding
  • A couple of times (20 points) There were a couple of driver/navigator rotations per hour of coding
  • Several times (30 points) There were 3 or even 4 rotations, but they weren’t spaced out very well over an hours coding
  • Yes (40 points) There were at least 4 rotations and all the rotations were roughly evenly spaced throughout the pairing, i.e. at least one every 15 minutes
We introduced more gradations than the original Yes/No in response to feedback to learners. However the other parts of the rubric are still Yes/No. We have a more graduated version of the communication rubric that we haven’t employed yet:
Did the pair continue to communicate effectively through the pairing process? Specifically, did the driver explain what they were doing as, or around, the process of them typing. Did the navigator ask questions or make suggestions or look up relevant documentation to the task at hand?
  • No (0 points) There was no communication or no helpful task focused communication between driver and navigator.
  • Not so much (10 points) There was almost no communication between driver and navigator, but they did communicate a little. Or the majority of communication was not helpful or task-focused.
  • Some (20 points) There were some occasional periods when the pair was communicating effectively, but there were longer periods of total silence and/or most of the communication was unrelated to the task and not helping the pair move forward.
  • A Fair Amount (30 points) There was communication but it was on and off, or communication that was unhelpful, e.g. talking off topic, getting angry etc.
  • Quite a lot (40 points) The communication was pretty good, but there was the occasional period with no communication of any kind, when perhaps it might have been helpful.
  • Yes, lots and very effectively (50 points) There was regular communication between the driver and the navigator. Although there might be occasional silences it is clear from the communication that driver and navigator are focused on solving the same task together, and they are using excellent communication skills to complete that task.
We’re holding off this further extension of the rubric for fears of too much complexity. There are also issues arising from how the edX peer review training component works where the learner has to match the instructed chosen grades on example videos, and so a more complex rubric leads to an even trickier peer review training process.

The edX peer review system is also a little tricky for some learners since progress through the review training component is not obvious. That said there is great support for learners to give feedback on the reviews they receive, and a nice admin interface to allow us to override peer review grading where necessary.  I just which I could hook it all up to a slack bot via APIs ...

The peer review of pair programming videos is an experiment to see if we can effectively and reliably grade pair programming activity. Pair programming has been shown to have great benefits for learners in terms of understanding. The process of explaining technical concepts to others is hugely beneficial in terms of consolidating learning. We’ve encouraged learners in the MOOC to use the AgileVentures remote pairing system to schedule pairing events, and use a shared Gitter chat room for additional coordination.

The most challenging parts of MOOC pair programming appear to be the scheduling and frustrations arising from pair programming logistics. Anecdotally there is a fair amount of time spent in Google hangouts waiting for a pair partner to appear, or conversely one’s pair partner has to leave early. Some people feel nervous about the process of pair programming with a different stranger each week.
Some of this is specific to a MOOC where participation drops precipitously as the weeks go by. For the first assignments it’s easy to find a partner, but if you join the course late, or simply move on to the later assignments where fewer learners are available finding a pair partner in your timezone can be challenging.

Of course your pair partner doesn’t have to be in the same timezone as you. In principle there are learners in other timezones with different schedules that happen to overlap with you. We don’t require learners to pair with others in the MOOC. They can pair with their room-mates, work colleagues, friends, whoever comes to hand. The only requirement is that a video is submitted. AgileVentures events provide a relatively convenient way to generate hangouts on air which make it fairly straightforward to generate a youtube screencast of a pairing session. Even the hangout itself can be used as a local pairing sessions recording mechanism. There are however too many points that one can get stuck at.

Have we found the perfect solution? Certainly not. The difficulty with a more rigid pair scheduling system in a MOOC is not being able to rely on the participation of learners who are only involved in an ad-hoc basis. That said, my experience running a bootcamp is that an imposed pairing schedule is actually preferred by most learners, since it removes the cognitive overhead of negotiating with others about initiating pairing sessions.

Perhaps for the next run of the MOOC, we could have a more structured approach where we get groups of 8 together at specific times, with the assumption that at least 2 of the 8 will show up and will be able to pair … we’ll need to analyse all the pairing data in some detail first …

Friday, July 1, 2016

Delivering Features vs Testing them (and Jasmine Async)

Michael was a little later to our pairing session yesterday and so I spent a little time scratching some itches about the way in which our AgileBot pings the Gitter chat for the “Agile Development using Ruby on Rails” MOOC. I’d previously been frustrated in my attempts to add hyperlinking to the Slack messages since the ‘request’ library seems to escape the Slack hyperlink syntax, and it wasn’t immediately clear how to address that. Any changes to the AgileBot might easily blow up since it’s a legacy app with no tests. Michael and I had been trying to rectify that this week, but we were moving slowly due to lack of familiarity with the CoffeeScript AgileBot is written in; so I took the opportunity of some solo time to have a crack at reformatting the Gitter messages, which looked like this:

existing message format

No one had complained about them explicitly, but my intuition was that the long hangout link looked messy, and might be intimidating. Of course this is dangerous territory for Agile development. A common part of the Agile approach is to ensure that you are responding to customer needs, rather than incorporating necessary features on a whim. However there’s a big difference between building a system for a paying customer compared to edging a volunteer community towards critical mass. The paying customer gives you a concrete person to satisfy. In a community you’ll get suggestions, but they may not be practical and as the saying goes, they might just tell you they want a faster horse.

Anyhow, the AgileVentures system has lots of little rough edges, itches I want to scratch, and scratching them makes me feel good. Remembering that Gitter supports the full markdown syntax (unlike Slack) I thought there might be a quick win and with a live chat I could easily get user feedback like so:

my suggestion in the Gitter chat

which got some immediate feedback


and allowed us to evolve the formatting a little further

evolving the formatting

The change involved a tiny reformatting to a string in the AgileBot, specifically:

send_gitter_message_avoid_repeats room, "[#{req.body.title} with #{user.name}](#{req.body.link}) is starting NOW!"

I pushed it to the staging server, tested it there, saw that it worked on our agile-bot test channel, and then pushed it live. Within vary short order we had the re-formatted messages coming into the channel:


I hadn’t been able to remove the time element, which is an artefact not of the AgileBot, but of the main site. I got a quick pull request in to fix that, but that’s take a little longer to deploy.

Now maybe changing these Gitter links won’t make much difference to the community in the long run. At the very least they made me feel good and motivated on a Thursday. I hope the users in the chat get a positive feeling and are maybe inspired to get more involved in the community, and my biggest hope is that as we re-run “Agile Development using Ruby on Rails” that completely new users will have slightly less friction as they consider joining a pairing session.

I’m particularly glad that I spent the time solo-ing on the above, because the rest of the afternoon pairing with Michael was somewhat frustrating in that we were repeatedly stuck on getting a basic test of the 3rd party HTTP connection for the AgileBot. We did make progress over the course of the session, but nothing that any end user would see soon. The AgileBot hits 3rd party services Gitter and Slack to get its job done. If we are going to do proper integration tests these services need to be stubbed. We trying the node equivalent of VCR, Sepia by the folks at LinkedIn, which will record and save 3rd party HTTP interactions and allow them to be played back, effectively sandboxing an app. We got sepia working, however in playback mode it highlighted cache misses (unexpected network connections) by creating a file rather than throwing an error that could be caught by JasmineNode.

We set that aside and tried Nock, one of many node world equivalents of WebMock that allows precision stubbing of network connections. Personally I prefer approaches like VCR and Sepia that allow blanket recording of interactions, in contrast to WebMock and Nock which require you to write out individual network stubs by hand. We had nock working, but ran up against node async issues. The JasmineNode tests were not waiting for the HTTP connections to complete, and we were tying ourselves in knots trying to get the JasmineNode async to work in CoffeeScript.

We untied ourselves by dropping back to ground truth, by first getting the example Jasmine Async test working in pure JavaScript:

describe("Asynchronous specs", function() {
  var value;
  beforeEach(function(done) {
    setTimeout(function() {
      value = 0;
    }, 1);

  it("should support async execution of test preparation and expectations", function(done) {

That working we converted to CoffeeScript and confirmed that that worked

describe 'Asynchronous specs', ->
  value = undefined
  beforeEach (done) ->
    setTimeout (->
      value = 0
    ), 1
  it 'should support async execution of test preparation and expectations', (done) ->
    expect(value).toBeGreaterThan 0
then carefully inserted the elements from our AgileBot HTTP testing setup:

nock = require('nock');

slack = nock('https://slack.com')
                .reply(200, {
                  ok: false,
                  error: 'not_authed'

avHangoutsNotifications = require('../scripts/av-hangouts-notifications.coffee')

describe 'AV Hangout Notifications', ->
  beforeEach ->
    routes_functions = {}
    avHangoutsNotifications({router: { post: (s,f) -> routes_functions[s] = f } })
    @routes_functions = routes_functions

  describe 'hangouts-video-notify', ->
    beforeEach (done) ->
      res = {}
      res.writeHead = -> {}
      res.end = -> {} 
      req = { body: { host_name: 'jon', host_avatar: 'jon.jpg', type: 'Scrum' } }
      req.post = -> {} 
      setTimeout (->
      ), 3000

    it 'should support async execution of test preparation and expectations', (done) ->

and this would just crash the whole thing. So at least we had identified that the problem was with our code, not how we happened to be implementing Async testing in CoffeeScript in JasmineNode. It was precisely this line @routes_functions['/hubot/hangouts-video-notify'](req,res) that was causing the crash. The one that started the network connection. Having not yet set up for an interactive debugger it was console.log statements all the way down to discover that it was the error reporting roller component under our hood that was actually breaking everything. Clearly that needed to be disabled for testing purposes. That was achieved like so:

rollbar.init(process.env.ROLLBAR_ACCESS_TOKEN, {enabled: false})

and suddenly all the tests went green. It was a frustrating process, but it highlights the problem solving approach of breaking out the different elements of the system you are testing in order to isolate where the problem actually exists. We’ve achieved a much better understanding of the legacy app as a result of all this. I’m just glad for motivations sake that I kicked out a minor improvement direct to the users before we spent the afternoon wrestling with testing frameworks :-)

Tuesday, June 28, 2016

Critical Mass of a Community

The holy grail of the Agile Ventures community, and perhaps any community, is to achieve "Critical Mass".  That's the point at which the community becomes self-sustaining and activity proceeds without relying on any one particular individual to keep it going.  "Critical Mass" is a term from physics which describes the threshold weight of nuclear material required to create a nuclear explosion.

In nuclear material it's the movement of particles called "neutrons" that cause individual atoms (in particular the atomic nuclei) to split apart, or undergo what's called Nuclear Fission. What makes a nuclear explosion possible is that this process of fission releases additional neutrons, which can go on and cause other atoms to split apart.  If you have a large enough amount of the right material it's almost inevitable that each neutron generated will collide with another atom as it travels through the material, which generates more neutrons which collide with other atoms and so on.  This is called a chain reaction.  Have too little material and the neutrons will be leave the material without having hit other atoms, and the chain reaction dies out.

Let's explore the analogy with a community, in particular a pair programming community.  Each pairing session could be considered an atom.  Assuming the you have one pairing session take place (and it goes reasonably well), you'll end up with two people who are interested in pairing again.  They'll then be searching for other pairing sessions, but if there are none available, or none that they happen to be interested in (wrong programming language or platform) then it's likely these two will drift off and perhaps not try to pair in the same community again.  However if these two do find other pairing sessions, you can see how the single successful pairing event can lead to two more.  Assuming those sessions go well, you have four people now looking to pair and so on.

Under the right conditions you can get a chain reaction.  It requires a critical mass of people taking part in pairing sessions.  Ideally whenever anyone wants to find a pair, there is always someone there ready to go.  Of course all this depends on people being able to find and join pairing sessions and also for them to go well.

Too few people and there's just not that many opportunities for pairing; but lots of people is not enough.  Imagine that lots of people are trying to pair but that problems with the interface mean that people trying to join a pairing session end up in the wrong location.  No pair partner, no pairing.  Michael and I uncovered one problem with the AgileVentures interface last week.  Hangouts that had people in them were being reported as "not live" after 4 minutes.  This meant that on a fair number of occasions people attempting to join a hangout for pairing or for a meeting would find themselves on their own in a separate hangout.

We've just rolled out a fix for this and hopefully this will be another step towards achieving critical mass in the community.  It's unlikely to be the only step required as having a good pairing experience is more complex than nuclear fission.  We also want to adjust our user experience to maximise the chances of a good pairing experience for everyone.  It's not clear the best way to do that but clearly getting two people into the same hangout at the same time is an important pre-requisite.  Things that we're exploring include adding automated "pair rotation" timers to the hangout itself; having users rate their pairing experience; reflecting pairing activity through user profiles and so on.

We need to carefully monitor the changes and fixes we just made to see how the proportion of pairing event participation changes, and continue our Agile iterative process of making small changes and reflecting on their effect.  Making it more obvious which events are live might lead to more disruption in pairing events, or it might make observing ongoing pairing events easier, and that might make people more or less inclined to initiate their own pairing events.  It's not simple, but with careful measurement hopefully we can find that sequence of changes to the user experience that will lead to critical mass!

Friday, June 24, 2016

Analyzing Live Pair Programming Data

The recent focus for the AgileVentures (WebSiteOne) development team is trying to make the process of joining an online pair programming as smooth as possible.  The motivation is two-fold; One, we want to have our users to have a good experience of pairing, and all the learning benefits it brings; Two, we want to have large numbers of users pairing so that we can get data from their activity to analyse.  The latter motivation sort of feeds the first one really, since the point of analysing the data is to discover how we can serve the users better, but anyhow ... :-)

Several months back we created an epic on our waffle board that charted the flow from first encountering our site to taking part in a pairing session.  We identified the following key components:
  1. Signing Up
  2. Browsing existing pairing events
  3. Creating pairing events
  4. Taking the pairing event live
  5. Event notifications
  6. Event details (or show) page
  7. Editing existing events
The sequence is only approximate as signing up/in is only required if you want to create an event, and not required for you to browse and join events.  The important thing was that there were various minor bugs blocking each of the components.  We set about trying to smooth the user experience for each of the components, including sorting out GitHub and G+ signup/signin issues, providing filtering of events by project, setting appropriate defaults for event creation and ironing out bugs from event edit and update, not to mention delivering support for displaying times in the users timezone, and automatically setting the correct timezone based on the user's browser settings.

There are still other points that could be smoothed out, but we've done a good portion of the epic. The question that partly troubles me now is how to "put it to bed".   A new epic that contains only the remaining issues is probably the way to go, but finally we've got to the point to start analysing some data, since we've got the notifications for the edX MOOC pairing activity flowing to the MOOC Gitter chat fairly reliably and we've just broken through on removing key confusions about joining an event, and working out some problems about the event displaying whether it is live.

This last element is worth looking at in a little more detail as it strongly affects the type of data we are gathering.  Creating (and tracking) Google Hangouts for pairing from the AgileVentures site involves creating a Google Hangout that has a particular plugin, called HangoutConnection, that knows the server side event it is associated with.  This was originally designed by Yaro Apletov and is written in CoffeeScript.  It gets loaded when the hangout starts and attempts a connection back to the main AgileVentures site.  Given successful contact an EventInstance object is created in association with the event.  This EventInstance includes information about the hangout such as the URL, so that other people browsing the site can also join the hangout without being specifically invited.  The HangoutConnection continues to ping the site every two minutes assuming the hangout is live, the plugin hasn't crashed and so on.

What Michael and I identified on Wednesday was that only the first of these pings actually maintained the live status, making it look like all our pairing hangouts were going offline after about 4 minutes.  This had been evidenced by the "live now" display disappearing from events somewhat sooner than appropriate.  This might seem obvious, but the focus has been on fixing many other pressing issues and usability concerns from the rest of the epic.  Now that they are largely completed this particular problem has become much clearer (also it was obscured for the more regular scrums which use a different mechanism for indicating live status).  One might ask why our acceptance tests weren't catching this issue.  The problem here was that the acceptance tests were not simulating the hit of the HangoutConnection to our site.  They were manipulating the database directly, thus as is often the case, the place where the bug occurs is just in that bit that wasn't covered by a test.  Adjusting the tests to expose the problem made the fix was relatively straightforward.

This is an important usability fix that will hopefully create better awareness that hangouts are live (with people present in them), and increase the chances of people finding each other for pairing.  There's a lot more work to do however, because at the moment the data about hangout participants that is sent back from HangoutConnection gets overwritten at each ping.  The Hangout data being sent back from HangoutConnection looks like this:

    "0" => {
                   "id" => "hangout2750757B_ephemeral.id.google.com^a85dcb4670",
        "hasMicrophone" => "true",
            "hasCamera" => "true",
        "hasAppEnabled" => "true",
        "isBroadcaster" => "true",
        "isInBroadcast" => "true",
         "displayIndex" => "0",
               "person" => {
                     "id" => "123456",
            "displayName" => "Alejandro Babio",
                  "image" => {
                "url" => "https://lh4.googleusercontent.com/-p4ahDFi9my0/AAAAAAAAAAI/AAAAAAAAAAA/n-WK7pTcJa0/s96-c/photo.jpg"
                     "na" => "false"
               "locale" => "en",
                   "na" => "false"

Basically the current EventInstance will only store a snapshot of who was present in the hangout the last time the HangoutConnection pinged back; and data from pings after the first two minute update has been being discarded.  We're about to fix that, but here's the kind of data we can now see about participation in hangouts:

#participants #hangouts
1             *
1             *
1             ****
2             *
3             **
1             ****
1             *
2             *
3             *
1             *
1             ***************
2             *
3             *
1             ******************************
2             ****
4             **

The above is just a snapshot that corresponds to the MOOC getting started; we're working on a better visualisation for the larger data set.  We can see a clear spike in the number of hangouts being started, and a gradually increase in the number of hangouts with more than one participant, remembering that the participant data is purely based on who was present at two minutes into the hangout.

If the above data was reliable we might be saying, wow we have a lot of people starting hangouts and not getting a pair partner.  That might be the case, but it would be foolish to intervene on that basis using inaccurate data.  Following the MOOC chat room I noticed some students at the beginning of the course mentioning finding hangouts empty, but the mood music seems to have moved towards people saying they are finding partners; and this is against the backdrop of all the usability fixes we've pushed out.

To grab more accurate participation data we would need to do one or more of the following:
  1. adjust the EventInstance data model so that it had many participants, and store every participant that gets sent back from the HangoutConnection
  2. store the full data sent back from every HangoutConnection ping
  3. switch the HangoutConnection to ping on participant joining and leaving hangouts rather than just every two minutes
  4. ruthlessly investigate crashes of the HangoutConnection
With reliable data about participation in pairing hangouts we should be able to assess some objective impact of our usability fixes as they roll out.  We might find that there are still lots of hangouts with only one participant, in which case we'll need to investigate why, and possibly improve awareness of live status and further smooth the joining process.  We might find that actually the majority of hangouts have multiple participants, and then we could switch focus to a more detailed analysis of how long participants spend in hangouts, getting feedback from pair session participants about their experience, and moving to display pairing activities on user profiles to reward them for diligent pairing activities and encourage repeat pairing activities.

Personally I find this all intensely fascinating to the exclusion of almost everything else.  There's a real chance here to use the data to help adjust the usability of the system to deliver more value and more positive learning experiences.

Monday, June 20, 2016

Moving Beyond Toy Problems

What is life for?  To follow your passion. What is my passion? I find myself frustrated with the closed source, closed development nature of many professional projects; and on the flip side equally frustrated with the trivial nature of academic "toy" problems designed for "learning".

I love the "in principle" openness of the academic sphere and the "reality bites" of real professional projects.  I say "in principle" about academic openness, because while the results of many experiments are made freely available, sharing the source code and data (let alone allowing openness as to the process) is often an afterthought if it is there at all.  The MOOC revolution has exposed the contents of many university courses which is a fantastic step forward, but the contents themselves are often removed from the reality of professional projects, being "toys" created for the purpose of learning.

Toy problems for learning makes sense if we assume that learners will be intimidated or overwhelmed by the complexity of a real project.  Some learners might be ready to dive in, but others may prefer to take it slow and step by step.  That's great - I just don't personally want to be spending my time devising toy problems, or at least not the majority of my time.  Also it seems to me that the real learning is the repeated compromises that one has to make in order to get a professional project out the door; balancing the desire for clarity, maintainability, readability and craftsmanship against getting features delivered and actually having an impact that someone cares about.

Professional projects are typically closed source, closed development; although there are more and more open source projects in the professional sphere; the basic idea seems to be: we are doing something special and valuable, and we don't want you to see our secret sauce, or the mistakes we are making along the way.  Thus it might be considered anti-competitive for a company to reveal too much about the process it uses to develop its software products.  That said,  companies like ThoughtBot publish their playbook, giving us an insight into their process and perhaps increasing our trust that their process is a good one.  Even so we don't get to see the "actual" process, and so that's not ideal for others trying to learn, but then most companies are not trying to support the learning process for those outside.

Personally I want to have a global discussion that everyone can take part in, if they want to.  I want an informed debate about the process of developing software where we have real examples from projects - real processes - where we can all look at what actually happened rather than relying on people's subjective summaries.

Maybe this is just impossible, and an attempt at the pure "open development" process of AgileVentures is destined to fail because by exposing exactly how we do everything we can't build up value to sustain our project?  That's what professional companies do right?  They have a hidden process, focus attention on the positive results of that process and then increase the perception that they have something worth paying for.  To the extent that they are successful they are building up reputation that will sustain them with paying customers, because those customers are inclined to believe the chance is good they'll get value for money.

If the customer had total transparent access to every part of what goes on, they could just replicate it themselves right? :-) Or a competitor could provide the same service for less?  However there's a strength in openness - it shows that you believe in yourself and you demonstrate that you've followed the hard path through the school of knocks and maybe you are the right people to be able to adapt fast to the next change, even if others could copy aspects of what you do.

Everyone should have the opportunity to learn and boost their skills by taking part in such an open process. The shy might not want to participate directly, but they can learn by observing a real process that they won't have till they can actually start in a job.  It's the old catch-22 "no job, no experience; no experience, no job".

This is what I stand for, what AgileVentures stands for.  An open process beyond open source, we call it "Open Development".  Meetings with clients, planning meetings, coding sessions, everything is open and available to all.  Where customers have confidential data, that is hidden, but otherwise we are as open as possible.  Of course that generates too much material for anyone to absorb, and we need help curating it, but the most amazing learning happens when new folk engage and take part in the process - thinking about the code in their pull requests in relation to the needs of a non-technical client being articulated by a project manager who cares about their learning, but also about the success of the project.  Come get involved, be brave, be open and get that experience that you can't get without a job, or that you can't get in your current job.