Autolab Development Blog

Autolab and Docker

Mon, 25 May 2015 05:40:07 +0000

Autolab started in 2010 and had a very humble beginning. It was used for only one class with ~300 students and simply ran on one laptop! Following that, its merits were realized by other courses at Carnegie Mellon resulting in nothing short of a success catastrophe: ~3000 students use it every semester, 77 courses have used Autolab as of Fall 2014 and autograding jobs are run on five Dell R410 machines with 2x Intel E5520 CPUS and 8 Nehalem cores.

While it’s really fun to mess around with these sweet, kickass machines, we realized we have to snap back to reality and think about a better way to run autograding jobs. This post outlines some basic autograding concepts, some problems we faced with the existing model and finally describes how Docker is integrated and how we hope it solves all our problems!

What it takes to Autograde

Terminology: Tango is the autograder used by Autolab.

A guarantee that Tango must provide is that the environment in which a job runs remains homogeneous between jobs. Hence, Tango executes these jobs on virtual machines that are booted with user-specified images. In order to run jobs on a particular platform such as Amazon EC2, OpenStack or even RackSpace, the Virtual Machine Management (VMM) interface must be implemented for that platform. The lifecycle of a job is governed by the VMM interface outlined below:

initializeVM - Initializes a VM for that platform.
waitVM - Wait for a VM to be accessible
copyIn - Copy all necessary input files to run the job.
runJob - Invoke the autodriver to run the job. The autodriver instruments the job, runs it in a controlled environment and restores the system state after job completion.
copyOut - Copy feedback file from the VM.
destroyVM - Destroy a given VM.
getVMs - Get a list of VMs in each VM pool.

At present, the VMM interface has been implemented for any local Linux machine and Amazon EC2. In order to execute a job, Tango’s job manager must check if there is an available VM to run the job (getVMs), ensure that the VM is reachable (waitVM), followed by performing the copyIn, runJob and copyOut operations.

Integrating Docker

Docker containers provide process, network and memory isolation. They are also designed to be multi-tenant citizens on any host they run on. A combination of these properties makes Docker containers ideal for running autograding jobs. Furthermore, clients have the flexibility of building and running jobs in any Docker image with customized software.

In the context of Docker, here is what the VMM interface does under the hood:

initializeVM: Assign a host machine for container to run on. Use round-robin for assignment.
waitVM: Ensure host machine is reachable by SSH.
copyIn: Create a directory on host machine and copy all input files to that directory.
runJob: Start a Docker container with the given Docker image and mount the previously created directory as a volume on to that container. Then, run the job and write the feedback file to that volume.
copyOut: Copy feedback file from the volume to Tango’s directory of feedback files.
destroyVM: Destroy the Docker container (docker rm CONTAINER) and its corresponding volume directory on this host machine.
getVMs - Get a list of volume directories that are currently used.

Clearly, implementing the VMM interface using Docker containers has greatly simplified each function! With this simpler approach, getting an end-to-end instance of Autolab with autograding is significantly simpler. Be sure to check out the wiki for specific setup instructions.

Next Steps

Implementing the Docker VMMS was the easy part. What keeps most engineers awake at night is of course testing. While we have tested this implementation by running specific labs hundreds of times simultaneously, we know that is not enough. We are currently working on writing A/B tests that will run an incoming job on both Docker and Tashi. We hope this will give us some clarity on the correctness of this approach. Maybe then, we can think about performance testing.

In the meantime, we’d love to hear any feedback, suggestions, criticisms, adulations or rants about what we have done so far and what you think would help us next.

Thanks for reading!

Making Autolab's Backend Scalable

Fri, 03 Apr 2015 00:00:00 +0000

Tango is a stand-alone, RESTful service that Autolab uses as the back-end for autograding. Tango receives grading jobs from Autolab’s front-end, adds them to a job queue, assigns them to available containers for grading and shepherds the job through the process. In its early days, Tango was mostly used for jobs that ran in under 5 seconds. However, over the recent semesters, Autolab has grown to host classes like Distributed Systems, Machine Learning and Storage Systems–classes with significantly higher compute requirements. As we looked into how to handle larger loads, we were running into the problem of how to manage queued jobs and distribute them to different instances in our back-end. To this end, we decided to take the initial step for turning Tango into a distributed system by implementing a persistent memory using Redis and using the producer consumer model.

Initial Tango architecture with in-memory job queue

The existing monolithic model with a single process and an in-memory queue was posing multiple problems:

Running a web server that handled large file uploads concurrently along with a job manager that dispatched jobs from the job queue to new threads made the process prone to crashes.
Crashes caused tall the jobs stored in the in-memory queue to disappear.
After a crash, all the containers running a job would suddenly be left in a so-called limbo state because they are no longer managed by any Tango process.

Therefore we decided to improve our architecture by switching to multi-process model with a persistent queue.

Persistent Memory Model using Redis

The in-memory job queue was the barrier to making the system robust. We want to store jobs on an independent system and make it persistent.

Tango with Persistent Memory Model

In this persistent memory model, even if Tango restarts it will keep the same job queue. We could even spin up multiple instances of Tango and they can all work concurrently, sharing the same queue (with a couple problems which I will mention in the next section).

Choice of Redis

We started looking into possible solutions: traditional relational databases like MySQL, document-oriented databases such as MongoDB, messaging queues such as RabbitMQ and key-value stores such as Redis. Looking at the amount of data and the level of nestedness, we decided that we don’t need a database with full-range of functionality and robustness. We needed something fast and simple to use. Our old job queue was just a simple Python dictionary; therefore, a fast key-value store like Redis seemed like the right fit. Setting it up and playing around with it was a breeze, therefore we decided to go with this option.

Implementation

While making the switch, we wanted to keep the ability to run Tango without a dependency on the external service in order to keep backwards compatibility and simplify setting up a local development environment.

In order to achieve this we created (pseudo)abstract classes that include the common methods and decide on what is used under the hood based on the configuration.

Since Python doesn’t really have abstract classes, we have define methods that would initiate the appropriate class and return it.

# This is an abstract class that decides on 
# if we should initiate a TangoRemoteDictionary or TangoNativeDictionary
# Since there are no abstract classes in Python, we use a simple method
def TangoDictionary(object_name):
    if Config.USE_REDIS:
        return TangoRemoteDictionary(object_name)
    else:
        return TangoNativeDictionary()

See the full code here.

In the future, we can have other implementations, such as MongoDB, and just add it as an option here. For example:

def TangoDictionary(object_name):
    if Config.USE_REDIS:
        return TangoRemoteDictionary(object_name)
    elif Config.USE_MONGO:
        return TangoMongoDictionary(object_name)
    else:
        return TangoNativeDictionary()

Serialization and Marshalling

Redis, being a key-value store, only accepts strings as values. Even though the objects we are trying to store are quite simple and flat, we do not want to create a new key-value pair for every attribute of the class. Instead we decided to serialize what we call the ‘remote objects’ using the pickle module included in the standard python library.

A sample method to put the object into the store looks like

def set(self, id, obj):
    pickled_obj = pickle.dumps(obj)
    self.r.hset(self.hash_name, str(id), pickled_obj)
    return str(id)   

To get it back you would need:

def get(self, id):
    unpickled_obj = self.r.hget(self.hash_name, str(id))
    obj = pickle.loads(unpickled_obj)
    return obj

Pickling seemed like a good solution at first, but when we needed to nest remote objects, e.g. put TangoMachines into the remote TangoQueue, we ran into the problem of self.r(type of redis.StrictRedis) not being serializable. To get around this, we defined custom getstate and setstate methods that would not include self.r attribute in the serialized string, but set it back when the object is deserialized.

def __setstate__(self, dict):
    self.__db= getRedisConnection()
    self.__dict__.update(dict)

Details of the implementation can be found on TangoObjects module.

Producer Consumer Model

So far we have covered how Tango receives grading jobs and stores them. The second most important part of the process is to read from the queue and distribute the jobs to the available containers(cluster of Virtual Machine or Docker). In the initial architecture, the consumer was part of the HTTP server process. A single Tango process would both produce and consume the jobs.

Having a shared queue gave us the ability to run multiple producers as explained in the previous section. However, we only needed one consumer to distribute the jobs.

Therefore, we decided to separate the consumer from the HTTP server (producer) as a standalone process.

Tango with Prod/Com processes and Persistent Memory Model

The advantage of this architecture is that we can launch an arbitrary number of HTTP servers as we receive more load. There would be only one consumer process which is responsible for reading from the queue and assigning the jobs. This process gives us a lot of flexibility because it can be run on any node, as well as be stopped and migrated to a different machine at any time.

We are currently testing the new architecture and will be posting more about the outcomes as it gets used by more users and we come across interesting cases.

Please feel free to ask questions in the comments below!

Autolab: Autograding for All

Thu, 26 Mar 2015 08:14:30 +0000

Let’s face it: teaching is hard. There’s the obvious time commitment required for classes and lectures, but striving for excellence in education necessarily requires countless hours spent outside of the classroom. This time generally gets split between

developing curriculum and assignments to stay relevant,
meeting with students (in office hours, breakout sessions, tutoring, etc.),
answering questions (whether in person or remotely),
grading assignments, quizzes, and tests,

and many more areas. This list doesn’t even begin to capture the inherently difficult problem of adapting to meet students’ unique needs, a challenge which grows exponentially the more students there are.

Looking at that list of challenges above, one entry stands out: grading. Very few people sit down to grade assignments with enthusiasm and vigor, which is unfortunate; students can’t learn from their mistakes until they know what they are. The time required to grade fundamentally opposes student learning and growth.

Teaching is hard, but at Autolab we help make it easier.

Enhancing Learning

For programming classes, we drastically cut down the submission-to-feedback time for students by exploiting autograding: programs evaluating other programs. In this model, teachers create autograder programs that define a number of tests to run on a student’s submission to assign it a grade.

Autolab aims for flexibility: by utilizing user-customized Linux VMs, instructors have absolute control over their autograding environment. Autograders can be written in any language, using any software packages, frameworks, compilers, databases—the possibilities are endless. No matter the subject, Autolab can help make autograding a reality.

Because of this flexibility, students can receive feedback on their assignments nearly instantaneously, closing the feedback loop in minutes rather than weeks. This opens the door to iterative learning: students are alerted to incorrect solutions immediately, enabling them to hone in on and fix troubling mistakes in their code. Over the course of one assignment, this feedback loop can run any number of times; each time, the student learns something above what they would have had the assignment been graded manually.

Fostering Community

In addition to kindling learning among individual students, we at Autolab aim to foster a number of communities in all aspects of our work.

Community among Classmates

Autolab provides class-wide scoreboards for autograded assignments. Scoreboards are a fun and powerful motivation for students, and an excellent way to build community. For the students at the top of the class, scoreboards encourage refinement and healthy competition to keep up engagement. For other students, scoreboards help to clearly outline what’s required for full credit. Of course, all names are anonymized by student-selected nicknames—some secretive and others clever. In our experience, a mix of curiosity and competitiveness foster a positive community everyone wins, regardless of skill level.

Community among Educators

Because autograders are self-contained, Autolab can provide a new community for instructors collaborating around designing, building, and iterating on quality assignments and labs. In the future, we hope to develop a platform where instructors can upload, discuss, share, and develop these assignments. This would help foster a community where educators can help each other improve their students’ learning.

Community among Users

We’re dedicated to this mission of fostering community, so it was only natural when it came to the decision of open-sourcing Autolab. We actively seek feedback from our users about how to improve, and we welcome contributions! Head on over to GitHub to browse the project source, read up on documentation, report issues, or open pull requests. If you notice something out-of-place or are dying to see a particular feature, please let us know! We also actively read mail at the Autolab Dev mailing list (see the footer). We love helping out as much as possible; this is only possible with your input.

Autograding for All

Our vision is to bring Autolab and the benefits of autograding to all programming and computer science classes, at the secondary- and university-levels. At Carnegie Mellon, where Autolab was initially conceived and is currently developed, we’ve seen Autolab’s success in the classroom. Each semester, we reach 3,000 undergraduate computer science students, amounting to over 100,000 autograded jobs every semester.

Despite this, we’ve only scratched the surface of autograding’s potential. If you’re interested in using Autolab for your class or school, reach out and we’d be happy to help you get up and running. Also be sure to check out the source and documentation on GitHub.