Should one test internal implementation, or only test public behaviour?

Unit TestingRefactoringAutomated TestsIntegration TestingCode Coverage

Unit Testing Problem Overview


Given software where ...

  • The system consists of a few subsystems
  • Each subsystem consists of a few components
  • Each component is implemented using many classes

... I like to write automated tests of each subsystem or component.

I don't write a test for each internal class of a component (except inasmuch as each class contributes to the component's public functionality and is therefore testable/tested from outside via the component's public API).

When I refactor the implementation of a component (which I often do, as part of adding new functionality), I therefore don't need to alter any existing automated tests: because the tests only depend on the component's public API, and the public APIs are typically being expanded rather than altered.

I think this policy contrasts with a document like Refactoring Test Code, which says things like ...

  • "... unit testing ..."
  • "... a test class for every class in the system ..."
  • "... test code / production code ratio ... is ideally considered to approach a ratio of 1:1 ..."

... all of which I suppose I disagree with (or at least don't practice).

My question is, if you disagree with my policy, would you explain why? In what scenarios is this degree of testing insufficient?

In summary:

  • Public interfaces are tested (and retested), and rarely change (they're added to but rarely altered)
  • Internal APIs are hidden behind the public APIs, and can be changed without rewriting the test cases which test the public APIs

Footnote: some of my 'test cases' are actually implemented as data. For example, test cases for the UI consist of data files which contain various user inputs and the corresponding expected system outputs. Testing the system means having test code which reads each data file, replays the input into the system, and asserts that it gets the corresponding expected output.

Although I rarely need to change test code (because public APIs are usually added to rather than changed), I do find that I sometimes (e.g. twice a week) need to change some existing data files. This can happens when I change the system output for the better (i.e. new functionality improves existing output), which might cause an existing test to 'fail' (because the test code only tries to assert that output hasn't changed). To handle these cases I do the following:

  • Rerun the automated test suite which a special run-time flag, which tells it to not assert the output, but instead to capture the new output into a new directory
  • Use a visual diff tool to see which output data files (i.e. what test cases) have changed, and to verify that these changes are good and as expected given the new functionality
  • Update the existing tests by copying new output files from the new directory into the directory from which test cases are run (over-writing the old tests)

Footnote: by "component", I mean something like "one DLL" or "one assembly" ... something that's big enough to be visible on an architecture or a deployment diagram of the system, often implemented using dozens or 100 classes, and with a public API that consists of only about 1 or a handful of interfaces ... something that may be assigned to one team of developers (where a different component is assigned to a different team), and which will therefore according to Conway's Law having a relatively stable public API.


Footnote: The article Object-Oriented Testing: Myth and Reality says,

> Myth: Black box testing is sufficient. > If you do a careful job of test case > design using the class interface or > specification, you can be assured that > the class has been fully exercised. > White-box testing (looking at a > method's implementation to design > tests) violates the very concept of > encapsulation. > > Reality: OO structure matters, part > II. Many studies have shown that > black-box test suites thought to be > excruciatingly thorough by developers > only exercise from one-third to a half > of the statements (let alone paths or > states) in the implementation under > test. There are three reasons for > this. First, the inputs or states > selected typically exercise normal > paths, but don't force all possible > paths/states. Second, black-box > testing alone cannot reveal surprises. > Suppose we've tested all of the > specified behaviors of the system > under test. To be confident there are > no unspecified behaviors we need to > know if any parts of the system have > not been exercised by the black-box > test suite. The only way this > information can be obtained is by code > instrumentation. Third, it is often > difficult to exercise exception and > error-handling without examination of > the source code.

I should add that I'm doing whitebox functional testing: I see the code (in the implementation) and I write functional tests (which drive the public API) to exercise the various code branches (details of the feature's implementation).

Unit Testing Solutions


Solution 1 - Unit Testing

The answer is very simple: you are describing functional testing, which is an important part of software QA. Testing internal implementation is unit-testing, which is another part of software QA with a different goal. That's why you are feeling that people disagree with your approach.

Functional testing is important to validate that the system or subsystem does what it is supposed to do. Anything the customer sees should be tested this way.

Unit-test is here to check that the 10 lines of code you just wrote does what it is supposed to do. It gives you higher confidence on your code.

Both are complementary. If you work on an existing system, functional testing is the first thing to work on probably. But as soon as you add code, unit-testing it is a good idea also.

Solution 2 - Unit Testing

My practice is to test the internals through the public API/UI. If some internal code cannot be reached from the outside, then I refactor for removing it.

Solution 3 - Unit Testing

I don't have my copy of Lakos in front of me, so rather than cite I will merely point out that he does a better job than I will of explaining why testing is important at all levels.

The problem with testing only "public behavior" is such a test gives you very little information. It will catch many bugs (just as the compiler will catch many bugs), but cannot tell you where the bugs are. It is common for a badly implemented unit to return good values for a long time and then stop doing so when conditions change; if that unit had been tested directly, the fact that it was badly implemented would have been evident sooner.

The best level of test granularity is the unit level. Provide tests for each unit through its interface(s). This allows you to validate and document your beliefs about how each component behaves, which in turn allows you to test dependent code by only testing the new functionality it introduces, which in turn keeps tests short and on target. As a bonus, it keeps tests with the code they're testing.

To phrase it differently, it is correct to test only public behavior, so long as you notice that every publicly visible class has public behavior.

Solution 4 - Unit Testing

There have been a lot of great responses to this question so far, but I want to add a few notes of my own. As a preface: I am a consultant for a large company that delivers technology solutions to a wide range of large clients. I say this because, in my experience, we are required to test much more thoroughly than most software shops do (save maybe API developers). Here are some of the steps we go through to ensure quality:

  • Internal Unit Test:
    Developers are expected to create unit tests for all the code they write (read: every method). The unit tests should cover positive test conditions (does my method work?) and negative test conditions (does the method throw an ArgumentNullException when one of my required arguments is null?). We typically incorporate these tests into the build process using a tool like CruiseControl.net
  • System Test / Assembly Test:
    Sometimes this step is called something different, but this is when we begin testing public functionality. Once you know all your individual units function as expected, you want to know that your external functions also work the way you think they should. This is a form of functional verification since the goal is to determine whether the entire system works the way it should. Note that this does not include any integration points. For system test, you should be using mocked-up interfaces instead of the real ones so that you can control the output and build test cases around it.
  • System Integration Test:
    At this stage in the process, you want to connect your integration points to the system. For example, if you're using a credit card processing system, you'll want to incorporate the live system at this stage to verify that it still works. You would want to perform similar testing to system/assembly test.
  • Functional Verification Test:
    Functional verification is users running through the system or using the API to verify that it works as expected. If you've built an invoicing system, this is the stage at which you will execute your test scripts from end to end to ensure that everything works as you designed it. This is obviously a critical stage in the process since it tells you whether you've done your job.
  • Certification Test:
    Here, you put real users in front of the system and let 'em have a go at it. Ideally you've already tested your user interface at some point with your stakeholders, but this stage will tell you whether your target audience likes your product. You might've heard this called something like a "release candidate" by other vendors. If all goes well at this stage, you know you're good to move into production. Certification tests should always be performed in the same environment you'll be using for production (or an identical environment at least).

Of course, I know that not everyone follows this process, but if you look at it from end to end, you can begin to see the benefits of the individual components. I haven't included things like build verification tests since they happen on a different timeline (e.g., daily). I personally believe that unit tests are critical, because they give you deep insight into which specific component of your application is failing at which specific use case. Unit tests will also help you isolate which methods are functioning correctly so that you don't spend time looking at them for more information about a failure when there's nothing wrong with them.

Of course, unit tests could also be wrong, but if you develop your test cases from your functional/technical specification (you have one, right? ;)), you shouldn't have too much trouble.

Solution 5 - Unit Testing

If you are practicing pure test-driven development then you only implement any code after you have any failing test, and only implement test code when you have no failing tests. Additionally only implement the simplest thing to make a failing or passing test.

In the limited TDD practice I've had I've seen how this helps me flush out unit tests for every logical condition produced by the code. I'm not entirely confident that 100% of the logical features of my private code is exposed by my public interfaces. Practicing TDD seems complimentary to that metric, but there are still hidden features not allowed by the public APIs.

I suppose you could say this practice protects me against future defects in my public interfaces. Either you find that useful (and lets you add new features more rapidly) or you find that it is a waste of time.

Solution 6 - Unit Testing

You can code functional tests; that's fine. But you should validate using test coverage on the implementation, to demonstrate that the code being tested all has a purpose relative to the functional tests, and that it actually does something relevant.

Solution 7 - Unit Testing

You shouldn't blindly think that a unit == a class. I think that can be counter productive. When I say that I write a unit test I'm testing a logical unit - "something" that provides some behaviour. A unit may be a single class, or it may be several classes working together to provide that behaviour. Sometimes it starts out as a single class, but evolves to become three or four classes later.

If I start with one class and write tests for that, but later it becomes several classes, I will usually not write separate tests for the other classes - they are implementation details in the unit being tested. This way I allow my design to grow, and my tests are not so fragile.

I used to think exactly like CrisW demonstartes in this question - that testing at higher levels would be better, but after getting some more experience my thoughts are moderated to something between that and "every class should have a test class". Every unit should have tests, but I choose to define my units slightly different from what I once did. It might be the "components" CrisW talks about, but very often it also is just a single class.

In addition, functional tests can be good enough to prove that your system does what it's supposed to do, but if you want to drive your design with examples/tests (TDD/BDD), lower lever tests are a natural consequence. You could throw those low-level tests away when you are done implementing, but that would be a waste - the tests are a positive side effect. If you decide to do drastic refactorings invalidating your low-level tests, then you throw them away and write new once.

Separating the goal of testing/proving your software, and using tests/examples to drive your design/implementation can clarify this discussion a lot.

Update: Also, there are basically two ways of doing TDD: outside-in and inside-out. BDD promotes outside-in, which leads to higher-level tests/specifications. If you start from the details however, you will write detailed tests for all classes.

Solution 8 - Unit Testing

I agree with most of the posts on here, however I would add this:

There is a primary priority to test public interfaces, then protected, then private.

Usually public and protected interfaces are a summary of a combination of private and protected interfaces.

Personally: You should test everything. Given a strong testing set for smaller functions, you will be given a higher confidence that that hidden methods work. Also I agree with another person's comment about refactoring. Code coverage will help you determine where the extra bits of code are and to refactor those out if necessary.

Solution 9 - Unit Testing

Are you still following this approach? I also believe that this is right approach. You should only test public interfaces. Now public interface can be a service or some component that takes input from some kind of UI or any other source.

But you should be able to evolve the puplic service or component using the Test First approach. ie Define a public interface and test it for basic functionality. it will fail. Implement that basic functionality using background classes API. Write API to only satisfy this firt test case. Then keep on asking what the service can do more and evolve.

Only balancing decision that should be taken is of breaking the one big service or component into few smaller services and component that can be reused. If you strongly believe a component can be reused accross projects. Then automated tests should be written for that component. But again the tests written for the big service or component should duplicate the functionalitly already tested as a component.

Certain people may go into theorotical discussion that this is not unit testing. So thats fine. The basic idea is to have automated tests that test your software. So what if its not at unit level. If it covers integration with database (which you control) then its only better.

Let me know if you have developed any good process that works for you..since your first post..

regards ameet

Solution 10 - Unit Testing

I personally test protected parts too, because they are "public" to inherited types...

Solution 11 - Unit Testing

I agree that code coverage should ideally be 100%. This does not necessarily mean 60 lines of code would have 60 lines of test code, but that each execution path is tested. The only thing more annoying than a bug is a bug that hasn't run yet.

By only testing the public API you run this risk of not testing all instances of the internal classes. I am really stating the obvious by saying that, but I think it should be mentioned. The more each behavior is tested, the easier it is to recognize not only that it is broken, but what is broken.

Solution 12 - Unit Testing

I test private implementation details as well as public interfaces. If I change an implementation detail and the new version has a bug, this allows me to have a better idea of where the error actually is and not just what it is effecting.

Solution 13 - Unit Testing

[An answer to my own question]

Maybe one of the variables that matters a lot is how many different programmers there are coding:

  • Axiom: each programmer should test their own code

  • Therefore: if a programmer writes and delivers one "unit", then they should also have tested that unit, quite possibly by writing a "unit test"

  • Corollary: if a single programmer writes a whole package, then it's sufficient for the programmer to write functional tests of the whole package (no need to write "unit" tests of units within the package, since those units are implementation details to which other programmers have no direct access/exposure).

Similarly, the practice of building "mock" components which you can test against:

  • If you have two teams building two components, each may need to "mock" the other's component so that they have something (the mock) against which to test their own component, before their component is deemed ready for subsequent "integration testing", and before the other team has delivered their component against which your component can be tested.

  • If you're developing the whole system then you can grow the entire system ... for example, develop a new GUI field, a new database field, a new business transaction, and one new system/functional test, all as part of one iteration, with no need to develop "mocks" of any layer (since you can test against the real thing instead).

Solution 14 - Unit Testing

> Axiom: each programmer should test their own code

I don't think this is universally true.

In cryptography, there's a well-known saying: "it's easy to create a cipher so secure you don't know how to break it yourself."

In your typical development process, you write your code, then compile and run it to check that it does what you think it does. Repeat this a bunch of time and you'll feel pretty confident about your code.

Your confidence will make you a less vigilant tester. One who doesn't share your experience with the code will not have the issue.

Also, a fresh pair of eyes may have fewer preconceptions not just about the code's reliability but also about what the code does. As a consequence, they may come up with test cases the code's author hasn't thought of. One would expect those to either uncover more bugs, or spread knowledge about what the code does around the organization a bit more.

Additionally, there's an argument to be made that to be a good programmer you have to worry about edge cases, but to be a good tester you have worry obsessively ;-) also, testers may be cheaper, so it may be worth having a separate test team for that reason.

I think the overarching question is this: which methodology is the best at finding bugs in software? I've recently watched a video (no link, sorry) stating that randomized testing is cheaper than and as effective as human-generated tests.

Solution 15 - Unit Testing

It depends on your design and where the greatest value will be. One type of application may demand a different approach to another. Sometimes you barely catch anything interesting with unit tests whereas functional/integration tests yield surprises. Sometimes the unit tests fail hundreds of times during development, catching many, many bugs in the making.

Sometimes it's trivial. The way some classes hang together makes the return on investment of testing every path less enticing, so you may just draw a line and move on to hammering something more important/complicated/heavily used.

Sometimes it's not enough to just test the public API because some particularly interesting logic is lurking within, and it's overly painful to set the system in motion and exercise those particular paths. That's when testing the guts of it does pay off.

These days, I tend to write numerous, (often extremely) simple classes that do one or two things tops. I then implement the desired behaviour by delegating all of the complicated functionality to those inner classes. I.e. I have slightly more complex interactions, but really simple classes.

If I change my implementation and have to refactor some of those classes, I usually don't care. I keep my tests insulated as best I can, so it's often a simple change to get them working again. However, if I do have to throw some of the inner classes away, I often replace a handful of classes and write some entirely new tests instead. I often hear people complaining about having to keep tests up to date after refactoring and, while it's sometimes inevitable and tiresome, if the level of granularity is fine enough, it's usually not a big deal to throw away some code + tests.

I feel this is one of the major differences between designing for testability and not bothering.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionChrisWView Question on Stackoverflow
Solution 1 - Unit TestingPhilippe FView Answer on Stackoverflow
Solution 2 - Unit TestingmouvicielView Answer on Stackoverflow
Solution 3 - Unit TestingdarchView Answer on Stackoverflow
Solution 4 - Unit TestingEd AltorferView Answer on Stackoverflow
Solution 5 - Unit TestingKarl the PaganView Answer on Stackoverflow
Solution 6 - Unit TestingIra BaxterView Answer on Stackoverflow
Solution 7 - Unit TestingTorbjørnView Answer on Stackoverflow
Solution 8 - Unit TestingmonksyView Answer on Stackoverflow
Solution 9 - Unit TestingShameetView Answer on Stackoverflow
Solution 10 - Unit TestingAli ShafaiView Answer on Stackoverflow
Solution 11 - Unit TestingryanvView Answer on Stackoverflow
Solution 12 - Unit TestingNathaniel FlathView Answer on Stackoverflow
Solution 13 - Unit TestingChrisWView Answer on Stackoverflow
Solution 14 - Unit TestingJonas KölkerView Answer on Stackoverflow
Solution 15 - Unit TestingMark SimpsonView Answer on Stackoverflow