Complexity and Third-party Tools

Great article here from Alan Holub.

The only thing I would add is that I don’t think his experiences with Jersey are, in any way, an exception. I think they are the rule.

Which is why it is vitally important to always studiously and thoroughly examine, test, and vet any third-party components you choose to use:

  1. Start with a simple list of the basic capabilities you need. If you don’t have this, you shouldn’t even be looking at third-party components, even if they are the current hot tech in the field.
  2. Does a candidate provide the capabilities you need? Most evaluations seem to stop here. Don’t do that…
  3. How much more that you don’t need does the candidate provide? Whether or not you may take advantage of a feature you don’t need isn’t really relevant for the evaluation…
  4. How many other things does the package depend on? Are you going to end up with 3 or 4 logging packages because each third-party component you incorporate uses a different one?
  5. How easy is it to use? If its usage is more complex than the basic capabilities you need warrant, you’ll probably want to create wrappers, which will reduce the ROI on using the component.

Also, I think it’s important to realize that selecting a framework (in the sense of a full-featured development platform like .Net) that you are going to build your system on top of requires different considerations than when you are selecting a component to fill a specialized niche of functionality in your system. The biggest difference being that with a framework you really aren’t capable of rolling your own, but with a component that is always an option…

Posted in design, Software | Leave a comment

Interviewing Candidates

I was reading an interesting post on interviewing today, and came across this:

Whiteboard and coding interviews also routinely ask people to re-implement some common solution or low-level data structure. This is the opposite of good engineering. A good engineer will try to maximize productivity by using libraries to solve all but the most novel problems.

While I can’t disagree with the general point, I think it glosses over an important issue: Being able to reuse libraries does not imply being able to reuse them correctly.

I’ve been shocked and saddened at times to come across candidates who lack a basic understanding of fundamental data structures, algorithms, and ideas of computer science (or software engineering) as well as bedrock knowledge about how computers work.

One candidate I interviewed had just graduated with a B.S. degree and didn’t know how many bits were in a byte. She very quickly moved out of programming and into management. Another candidate didn’t know what a linked list was.

A linked list. Really.

Now I’m not saying they should write a linked list class in an interview — if they’ve been exposed to it at all it was probably in Data Structures 101 where they also implemented it. However, understanding the characteristics of a single- versus doubly-linked list, versus an array, versus a heap is important. Both for understanding performance profiles of the data structures and what they are suited for and not suited for.

Same with sorting. Bubble sort, insert sort, quick sort. From an interviewing standpoint what is important is not that a candidate can implement any particular sorting algorithm but that they understand the Big O notation and how the performance of sorting or merging is dictated by the data structures chosen as well as the nature of the data being sorted.

Now, there are lots of ways to assess whether or not a candidate is well-versed in these topics, and one of those ways is to ask them to sketch out an implementation of one of them. I don’t think that’s the most efficient or reliable way so I, like Laurie Voss, wouldn’t use that in an interview.

That’s the kind of evaluation I’m trying to make in an interview though: how well does the candidate understand relevant fundamentals — those I mention above, but also many others — and do they know how to apply them.

Posted in Software | Tagged | 3 Comments

What Do Students Know?

There is a lot of excitement about MOOCs these days, as well as conjecture about how higher education might look in the future. MIT is considering a more ala carte approach to structuring their programs.

I think the increasing access to sources of learning is fabulous and has no drawbacks of any significance whatsoever. However…

There’s a huge difference in what you learn in self-directed study as opposed to studying a complex topic under the tutelage of a professional or a curriculum designed by professionals. As a student coming in cold to a topic like computer science there are hundreds of directions your interests might take you, and there is also probably an online course or source of information that will help you along.

But I would contend that the learning-by-browsing model is not not an effective approach when applied to an entire field of knowledge. It works wonderfully for very directed, contained topics. But in my view the student doesn’t know enough to direct their own studies, at least until they’ve achieved a certain level of fluency with the basic material of the area.

And that, I think, is the difference between those who are self-educated and those who are educated via a curriculum; the self-educated have no way of being assured that they have a mastery of the breadth of the important material. Can the self-educated achieve such a mastery? Sure. But the mastery of a field like computer science involves training your mind how to think in abstractions while at the same time being aware of the minutest detail of the technology being used, and I don’t think it is easy, natural, or guaranteed that a course of self-study will get you there. You can be an absolute expert at Ruby and a horrible software engineer. (Replace “Ruby” by your technology of choice…)

So is college necessary in computer-science/programming/hacking? Not to master any particular bit of technology. But it is certainly the best path to ensuring a base level of knowledge from which to master such a huge and complex field of study. It may not even be necessary to secure a lucrative job or to start a company delivering software — but don’t be fooled. Monetary success does not mean you know what your doing.

Posted in Rant | Tagged , | Leave a comment

“Friend” classes, C#, and encapsulation

Ran into an interesting situation today:

  1. Client-server application
  2. A class representing domain data that is used by both client and server assemblies — so that’s three assemblies:
    1. One for the data class
    2. One for the server class (uses the data assembly)
    3. One for the client class (uses the data assembly)
  3. Data class has getters and setters for its properties
  4. Some of the setters should only be callable by code in the server; for example, a guid that is assigned by the server to identify the item

In C++ this would be pretty easy; just declare the setters private and make the server a friend.

In C#, the (almost) equivalent technique is to make the setter internal and then open up the internals for the data assembly to the server assembly. But this opens things up way too much! Now any internal member of any class in the entire data assembly can be accessed by any class/function in the server assembly. (And don’t tell me to use single-class assemblies or I’ll punch you in the head — it’s simply not practical for a non-trivial application.)

Searching with Mr. Google mostly finds conversations that consist of “How do I make a friend class in C#?” followed by a response of “You can’t, and friends break encapsulation and are never necessary in a proper OOP design”. I think this represents pedantic and faulty thinking.

In the example I give here I don’t accept that the basic design is flawed — it falls out quite naturally from the nature of the architecture. And friend actually preserves encapsulation much better than internal (or simply keeping the setters public) because it limits the scope of the presumed “damage”.

I’ve no doubt that there are other designs that could accomplish this, but not without exacting a high cost in added complexity or development overhead. And to be clear, this is a pattern that will be found multiple times in any client-server design, so the complexity will stack up quickly!

The problem here is that the unit of encapsulation is actually the totality of the 3 classes: data, client, and server. But the programming tools in use do not easily allow the expression of that level of encapsulation without undue complexity.

So this, I argue, is why friend is needed.

And what is “need” anyway? We don’t need to use C#; other languages would be just as suitable. In fact, we don’t need to have the software we’re building because we could continue to do things manually. However, we’ve made choices along the way to achieve some particular goal, but each choice limits us and constrain our problem-solving environment from that point on. Within this context, we need tools that make what we have to do reasonably simple (or at least no more complex that it has to be).

To me, this is a great example of why “best practice” or other pedantic approaches are problematic at best, and destructive at worst…

Posted in Coding, design, Software | Tagged , | Leave a comment

Design Damage and Mocks

Two recent articles that are well worth reading. First, from David Hansson:

Test-induced design damage

And a rebuttal from Bob Martin

Test-Induced Design Damage?

I think both articles make some good points. On this issue I tend to agree more with David than Bob.

However, I think one of the main things both positions suffer from is a rather myopic view of testing. The focus is almost exclusively on unit testing. David asserts, I think convincingly, that you can damage your design by contorting it to be testable in the sense of old-school unit testing.

I think the problem is this focus on old-school unit testing. I also assert (without proof for the moment) that unit testing is a fiction. Any test code is always testing more than the strict object of the test. If nothing else you are also testing the run-time support of the programming language you are using. Yes, the test is geared toward a particular aspect of your design or implementation, but the notion of a test running in isolation is illusory.

If instead you look at a test as exercising a portion of a system, but with an eye toward proving out one aspect of it, the need to mock everything to death goes away. You spend your time building up an execution framework that doesn’t abstract away all the implementation details that, in the end, are just as important as whatever the object of your test is.

Building automated tests does not induce design damage. Mocking induces design damage.

Posted in design, Software, Testing | Leave a comment

The Requirements Conundrum

Requirements are handled in a variety of ways by different projects. They might be written down formally, or stated informally, or not really written down at all. They might be determined before work begins, or be discovered during development, or discovered through intentional product iterations. They might be primarily based upon competitive needs, user needs, or be something so new there is no “need” context to fit it into.

And every combination thereof and more.

By all accounts, changing requirements have a huge impact on the development of the product, primarily, but not just, on its schedule. As requirements change, developers have to reevaluate their fundamental assumptions, which might invalidate a feature design, and sometimes, an entire architecture. I mean, not every requirement change will do that — some are quite benign and can even simplify the project.

But regardless of the development paradigm or the way requirements are managed, it is inevitably true that from the project start to the project end the requirements will change. Sometimes they’ll simply wobble about and induce fear and uncertainty, but other times they’ll be charging and shifting direction like a lunatic halfback. (Ha! I pulled off a sports metaphor!)

Why, you ask? Why, why, why? Can’t this unruly beast be brought to sanity?

No. Sorry.

Here are just a few of the reasons:

  1. Requirements are almost always poorly understood at the start of the project. Through the course of the project we are building knowledge that we didn’t have at the start. This is not really something that can be avoided via big upfront requirements-gathering either. It is a necessary part of any product development process. The problem with software is that we jump into an implementation which we then can’t throw away like industrial designers can throw away prototypes.
  2. The thoroughness and correctness of requirements is heavily dependent on having domain or subject-matter experts of one kind or another. If you don’t have them, or they aren’t the right ones for the project, the requirements-understanding process suffers.
  3. Especially over product lifetimes, or for projects with very long development cycles, these people may come and go from the team, yielding an ever-shifting view of the real requirements.
  4. Implementing a given requirement can often turn out to be much more difficult than was originally imagined. When this happens there needs to be some sort of negotiation process where the now-understood cost of the capability is balanced against its value, or even the need for it.
  5. Organizational dysfunction can severely effect requirements understanding too. If the developers and/or subject-matter experts are not empowered to define the requirements, well, you get what you might expect.

In my opinion and experience, requirements churn is inevitable. Avoidance is not possible, and adhering blindly to your first guess at requirements is a recipe for product failure and teamicide.

Wise processes and managers will plan for requirements churn, but the truly enlightened will welcome it because it means that your product has just gotten better. Which is all well and good until you start talking about schedule; that’s where the rubber meets the road, slips, and skitters out of control into the ditch.

If you change a requirement (or discover a new, extremely important requirement) and need more time to complete it, you have three choices:

  1. Accept the work
    1. Slip the schedule
    2. Add more resources
  2. Reject the work, sticking with the old requirement and let the product suffer
  3. Temporarily reject the work by postponing the release of the feature

Depending on the situation, any one of these might not be politically feasible. But adding more resources is often not technically feasible — some tasks simply can’t be effectively broken down further, or it may take too long to bring new resources up to speed.

So this is the requirements conundrum: Requirements will change, and you must deal with the changes, but you can’t really know how to deal with them until after the change has occurred.

Posted in Process, Software | Tagged | Leave a comment

Premature Optimization, or just not Wasteful?

I completely agree with the prevailing wisdom that optimization should only follow a demonstrated need for it. All smug and self-assured I code away merrily until I run into a situation that makes things all messy and complicated…

Here’s the situation: I want to output a logging message containing a snippet of results from a JSON payload returned from a REST call:

string toLog = json.Substring(30);

Nice and simple. However, it turns out that the JSON data contains newlines which kind of mess up the flow of my logging statements — I’d prefer it if the string was all on one line. Easy enough to accomplish:

string toLog = json.SubString(30).Replace("\n", "");

Hm. But this is kind of wrong. Now instead of a 30 character snippet of the JSON data, I’ll have 30 characters less how ever many newlines there were within the first 30 characters. So to fix this:

string toLog = json.Replace("\n", "").SubString(30);

Nice! But wait… These JSON payloads can be fairly large — multiple thousands of characters. That means I’ll essentially be copying a very large string in the course of doing the Replace(), only to extract a very small part of it.

Now, in principal, I really shouldn’t worry about it until I’ve seen some evidence that the cost of doing the Replace() call is prohibitive. And depending on the context, I would do just that. And indeed, in this particular case, the performance overhead will probably not be noticeable.

But it chafes me. It steams my beans! I shouldn’t be creating a copy of a multi-K string just to grab 30 characters! That’s just plain wasteful and I don’t like it.

I’m prepared at this point to entertain the theory that I’m making a big deal out of a small thing. I do that sometimes. However, as engineers we are continually faced with this kind of decision over and over. In the course of building a million-line codebase, how much inefficiency and overall gunk does this kind of thing add? How do slightly inefficient operations add up to influence efficiency in the large? In the past, Moore’s Law has protected us, but some are arguing that that protection is soon to lapse.

I don’t have a solution. In this case I went with the more efficient solution because a) it only required a slight modification to the desired behavior in that I wouldn’t always get 30 characters, and b) it felt like the right solution.

No one likes to chafe…

 

(PS: I know that the Substring() method isn’t safe if you ask for more characters than the string has. Rest assured that in the actual production code I’m actually using an extension method that takes care of that little detail…)

Posted in Coding, Uncategorized | Tagged | 1 Comment