Lessons learned porting legacy code to C++

This article is for programmers who are in the process of porting their code to C++. Primarily if the original source code is ObjC but most of these tips can be applied in general when porting to C++ from other languages.

This is my experience and processes adapted while porting TilemapKit from Objective-C to C++ (check current status).

The Recommended C++ Porting Process

Assuming that the port isn’t (supposed to be) a complete rewrite but rather a (more or less) 1:1 copy of the original code over to another language, I found that the following process works wonders:

  • Create the to-be-ported class or method in C++
  • Copy & paste the original code (entire class or individual methods) into the C++ class or method
  • Comment out the original, non-C++ code (you may need to edit or remove existing multiline comments in the original code)
  • Start re-implementing the code in C++ line-by-line
  • You will quickly find that you need to add extra members, methods or even classes – you have two choices:
    1. create the dependent classes or methods as stubs, initialised to or returning safe default values
    2. comment out the use of dependent classes and method calls but do initialise local values (as stand-in for return/out values) with a safe default value
  • For every chunk of code that you’ve successfully re-implemented in C++, remove the corresponding lines of code from the original, commented-out source code.
    • That way all remaining “original code” comments become your TODO list, or at least things you should come back later, review, and decide whether there’s still something left to do or not. If not, simply remove the remaining commented code.

The important part of this is adding the original source code as your TODO list right inside the class/method you are porting. By leaving any remaining unported code fragments you create reminders for you that there might still be something left to do. Leaving commented out code fragments may have different (but valid) reasons:

  • You wanted to get a working build more quickly.
  • You aren’t sure whether you’ll need that code/feature/hack in C++ anyway.
  • The remaining code may be non-trivial to re-implement and you want to leave it to another time where you’ll have more time or the mental energy to think about it.

Taming the STL with typedef

One issue with the STL is that it quickly creates a hard to read mess of convoluted code. Specifically so for developers who have been programming with modern languages and are used to being treated with highly readable, intuitive framework code.

Instead, you are required to write crap like this, this is just the declaration of a member variable:

Of course, once you start writing more code you may have methods that take a _terrainsByGID instance as parameter, so you’ll have to declare this whole shebang again over and over multiple times. You know this will become a huge problem!

So, if you think that using namespace std; would solve most STL readability issues, it just doesn’t – it only cuts off the tip of the iceberg. It’s also going to increase the risk of conflicting APIs since there’s a good chance that another namespace you are using also includes methods like begin() or end() or find().

What you should be doing is to not use any STL names directly in your code, but instead create typedefs for every variation you need. This way you’ll reduce mental load, you cut down on the amount of code to write, you make it easier to autocomplete types and to perhaps later change their type (ie when changing from unordered_map to hashed_set perhaps), and lastly you’ll improve overall readability of your code while tremendously reducing chances of making stupid mistakes!

Simply add a global header where you typedef all the uses of STL collections and such. For the above example you would need:

You’ll notice how these typedefs naturally build on each other and using a proper naming scheme makes the intentions easy to understand, even for new programmers. The member variable then, and all its uses in method parameters, simply become:

And you could pass an instance to this complex map to a function like so:

Whoops! This is C++ that almost feels modern!

Global functions emulating Framework behavior

Another thing I found greatly useful was to re-implement all those functions in Cocoa that STL doesn’t have as global functions of the same or similar name. You can do so most easily by writing static C functions, possibly even inlined. Don’t be afraid or ashamed to write C functions!

Take, for instance, NSString’s lastPathComponent method that returns a string’s last path component – either the last folder’s name or the filename. There is no such thing in STL. Hence I wrote this in a globally included header file:

What used to be several lines of convoluted code, now becomes a single function call that you can run regardless of what class you’re in. Of course, if you wanted to you could also create a helper C++ class and stuff those functions in there, but personally I prefer to be able to do this:

Over this:

But this is up to you, and the latter is preferable if you want to avoid any risk of potential name clashes.

The important part is to be meticulous about creating separate functions for every task that you are used to having right in the framework that you were used to using. Trust me: the next time you’ll need the same or similar functionality, you won’t remember how to write that code. Plus you’d be duplicating the code, and we all know that’s wrong.

PS: You probably assumed this already but just to confirm: String and StringRef are typedefs that map to std::string and const std::string& types.

Name your Spaces!

It seems so obvious to C++ developers but chances are you are porting your code from a language that doesn’t have a namespace keyword. In Objective-C for instance it is customary to prefix your classes to avoid name clashes:

Stop doing that in C++! You get to use namespaces instead. Much better!

You would enclose all your code’s header and implementation files in the same namespace. That makes it easy for you to write code like this:

Outside your project’s code you can either do:

to do the same or simply prefix the classes with your namespace name, for instance an end-user of your framework without the using directive would have to write:

There, no clashes yet if someone wanted to, they could get rid of the TK:: prefix by a using directive.

Going full auto?

C++11 knows the auto keyword. This is essentially similar to Objective-C’s id type or Swift’s let/var keywords. It allows you to declare a variable whose type is inferred by the compiler given the context. For instance you can rewrite this:

into this:

So it does safe some typing. Problem with auto is that it can not be applied in all situations, which limits its usefulness. Plus it does hide the type and therefore may actually make reading the code harder! Take this example for instance:

Okay, so what is instance again? You can’t tell by glancing over the code. Personally, in such cases I would try to avoid using auto at least until I’m more familiar with the API. Until then, I’d simply write it as usual:

Oh wow, that was actually a reference being returned, not a pointer. Who would have thought?

To const or not to const?

Ensuring const correctness in your code is important. More so if you create a framework/API for other users.

However, it’s not easy to grasp that concept if your language doesn’t support or use const as extensively as C++ does. So I bet most would eventually give up and not use const at all. Yup, that code works, but it will not be the best, fastest, safest C++ code you can write.

It’s well worth the time to read up on const correctness and start implementing it, even though it’ll bother and mess with you seemingly unpredictably. You’ll get errors simply because of your use of const, and every time you have to analyse that situation and decide whether const really applies here or not, or whether you made a bad design choice and need to refactor your code. That kind of thinking helps improve your code quality, and const correctness is one way of getting there.

However the downside is that it requires discipline. And you may want to forego this if you’re under time pressure. I understand. Still, keep it in the back of your mind and fix it later. But know that it will be even more painful to do so later on because the bad design decisions have already been made and are in use now.

So this is simply a call to action for you to look into const correctness and learn that concept if you aren’t aware of it yet.

Pass by value or by reference?

One question that comes up early when writing C++ code is the difference between passing parameters by value or by reference. In Objective-C it’s really simple: you pass a pointer to the method. Done.

In C++ you can either do that, or pass by reference. You should prefer the latter whenever possible:

Both functions behave exactly the same, except for the fact that the dereference operator changes from -> to a simple dot in the latter instance:

There’s one other difference: references can’t be NULL (well, they can but this constitutes a really bad programming error that you know better not to make). So you needn’t check for null at all.

Passing by reference has another benefit: if the passed object is created on the stack and it’s a large struct, then pass by reference will be faster than pass by value, because it would have to copy the entire struct’s memory and create a new one. So this is said to be most efficient:

This also lets the function modify the struct created by the caller. So even if the differences compared to passing pointers seem marginal, passing by reference should be the default choice whenever possible, while you still need to resort to passing pointers when they need to be NULL, or other reasons.

There are many subtleties surrounding this topic. Passing by value can be faster, and so can using C++11’s move semantics to avoid making additional copies of const references.

But overall, you will be fine (and safer) using (const) references as the default function argument type over pointers or passing by value.

Inline Property Accessors

Yup, C++ most tedious aspect. For every private member variable you may need to write a getter and optionally a setter method. These are typically prefixed with get and set, respectively.

Fortunately, there’s at least one shortcut: inlining accessor methods straight in the header file. So rather than declaring the accessors in the header and implementing it in the cpp file, you can add the simple accessors straight in the header file

This prevents your implementation file from being cluttered with simple accessor methods. Unfortunately, that’s about it. You can try improve on the brevity of this boilerplate code using either preprocessor macros or complete classes encapsulating properties, but the former is ripe with documentation issues (doxygen doesn’t pick up on macro-ified getters/setter) and the latter seems like plain and simple overkill.

PS: Notice how the getter has a trailing const keyword? That’s to tell both you and the compiler that calling this method will not modify the class members. This allows the compiler to perform certain optimisations (depending on context) and is one aspect to const correctness (see above).

That’s it, for now!

I hope you find some of this interesting. Please leave a comment if you have any questions or suggestions.

If I come across something of interest while porting TilemapKit to C++ (Cocos2d-x) I’ll let you know.

Comments

Loading Facebook Comments ...

Leave a Reply

Your email address will not be published. Required fields are marked *