Whose responsibility is it to check data validity?

Language Agnostic Problem Overview

I am confused as to whether it is the caller or the callee's responsibility to check for data legality.

Should the callee check whether passed-in arguments should not be null and meet some other requirements so that the callee method can execute normally and successfully, and to catch any potential exceptions? Or it is the caller's responsibility to do this?

Language Agnostic Solutions

Solution 1 - Language Agnostic

Both consumer side(client) and provider side(API) validation.

Clients should do it because it means a better experience. For example, why do a network round trip just to be told that you've got one bad text field?

Providers should do it because they should never trust clients (e.g. XSS and man in the middle attacks). How do you know the request wasn't intercepted? Validate everything.

There are several levels of valid:

All required fields present, correct formats. This is what the client validates.
1 plus valid relationships between fields (e.g. if X is present then Y is required).
1 plus # 2 plus business valid: meets all business rules for proper processing.

Only the provider side can do #2 and #3.

Solution 2 - Language Agnostic

For an API the callee should always do proper validation and throw a descriptive exception for invalid data.

For any client with IO overhead client should do basic validation as well...

Solution 3 - Language Agnostic

Validation: Caller vs. Called

The TLDR version is both.

The long version involves who, why, when, how, and what.

Both

Both should be ready to answer the question "can this data be operated on reliably?" Do we know enough about this data to do something meaningful with it? Many will suggest that the reliability of the data should never be trusted, but that only leads to a chicken and egg problem. Chasing it endlessly from both ends will not provide for meaningful value, but to some degree it essential.

Both must validate the shape of the data to ensure base usability. If either one does not recognize or understand the shape of the data, there is no way to know how to further handle it with any reliability. Depending on the environment, the data may need to be a particular 'type', which is often an easy way to validate shape. We often consider types that present evidence of common linage back to a particular ancestor and retain the crucial traits to possess the right shape. Other characteristics might be important if the data is anything other than an in memory structure, for instance if it is a stream or some other resource external the running context.

Many languages include data shape checking as a built-in language feature through type or interface checking. However, when favoring composition over inheritance, providing a good mechanism to verify trait existence is incumbent on the implementer. One strategy to achieve this is through dynamic programming, or particularly via type introspection, inference, or reflection.

Called

The called must validate the domain (the set of inputs) of the given context to which it will operate on. The design of the called always suggests it can handle only so many cases of input. Usually these values are broken up into certain subclasses or categories of input. We verify the domain in the called because the called is intimate with the localized constraints. It knows better than anyone else what is good input, and what is not.

Normal values: These values of the domain map to a range. For every foo there is one and only one bar.
Out of range/out of scope values: These values are part of the general domain, but will not map to a range in the context of the called. No defined behavior exists for these values, and thus no valid output is possible. Frequently out-of-range checking entails range, limit, or allowed characters (or digits, or composite values). A cardinality check (multiplicity) and subsequently a presence check (null or empty), are special forms of a range checking.
Values that lead to Illogical or undefined behavior: These values are special values, or edge cases, that are otherwise normal, but because of the algorithm design and known environment constraints, would produce unexpected results. For instance, a function that operates on numbers should guard against division by zero or accumulators that would overflow, or unintended loss of precision. Sometimes the operating environment or compiler can warn that these situations may happen, but relying on the runtime or compiler is not good practice as it may not always be capable of deducing what is possible and what is not. This stage should be largely verification, through secondary validation, that the caller provided good, usable, meaningful input.

Caller

The caller is special. The caller has two situations in which it should validate data.

The first situation is on assignment or explicit state changes, where a change happens to at least one element of the data by some explicit mechanism, internally, or externally by something in its container. This is somewhat out of scope of the question, but something to keep in mind. The important thing is to consider the context when a state change occurs, and one or more elements that describe the state are affected.

Self/Referential Integrity: Consider using an internal mechanism to validate state if other actors can reference the data. When the data has no consistency checks, it is only safe to assume it is in an indeterminate state. That is not intermediate, but indeterminate. Know thyself. When you do not use a mechanism to validate internal consistency on state change, then the data is not reliable and that leads to problems in the second situation. Make sure the data for the caller is in a known, good state; alternatively, in a known transition/recovery state. Do not make the call until you are ready.

The second situation is when the data calls a function. A caller can expect only so much from the called. The caller must know and respect that the called recognizes only a certain domain. The caller also must be self-interested, as it may continue and persist long after the called completes. This means the caller must help the called be not only successful, but also appropriate for the task: bad data in produces bad data out. On the same token, even good data in and out with respect to the called may not be appropriate for the next thing in terms of the caller. The good data out may actually be bad data in for the caller. The output of the called may invalidate the caller for the caller's current state.

Ok, so enough commentary, what should a caller validate specifically?

Logical and normal: given the data, is the called a good strategy that fits the purpose and intent? If we know it will fail with certain values, there is no point in performing the call without the appropriate guards most times. If we know the called cannot handle zero, do not ask it to as it will never succeed. What is more expensive and harder to manage: a [redundant (do we know?)] guard clause, or an exception [that occurs late in a possibly long running, externally available resource dependent process]? Implementations can change, and change suddenly. Providing the protection in the caller reduces the impact and risk in changing that implementation.
Return values: check for unsuccessful completion. This is something that a caller may or may not need to do. Before using or relying upon the returned data, check for alternative outcomes, if the system design incorporates success and failure values that may accompany the actual return value.

Footnote: In case it wasn't clear. Null is a domain issue. It may or may not be logical and normal, so it depends. If null is a natural input to a function, and the function could be reasonably expected to produce meaningful output, then leave it to the caller to use it. If the domain of the caller is such that null is not logical, then guard against it in both places.

An important question: if you are passing null to the called, and the called is producing something, isn't that a hidden creational pattern, creating something from nothing?

Solution 4 - Language Agnostic

It's all about "contract". That's a callee that decides which parameters are fine or not. You may put in documentation that a "null" parameter is invalid and then throwing NullPointerException or InvalidArgumentException is fine.

If returning a result for null parameter make sense - state it in the documentation. Ususally such situation is a bad design - create an overriden method with fewer parameters instead of accepting null.

Only remember about throwing descriptive exceptions. By a rule of thumb:

If the caller passed wrong arguments, different than described in documentation (i.e. null, id < 0 etc) - throw an unchecked exception (NullPointerException or InvalidArgumentException)
If the caller passed correct arguments but there may be an expected business case that makes it impossible to process the call - you may want to throw a checked descriptive exception. For example - for getPermissionsForUser(Integer userId) the caller passes userId not knowing if such user exists but it's a non-null Integer. Your method may return a list of permissions or thorw a UserNotFoundException. It may be a checked exception.
If the parameters are correct according to the documentation but they causes processing internal error - you may throw an unchecked exception. This usually means that your method is not well tested ;-)

Solution 5 - Language Agnostic

Well... it depends.

If you can be sure how to handle invalid data inside your callee then do it there.

If you are not sure (e.g. because your method is quite general and used in a few different places and ways) then let the caller decide.

For example imagine a DAO Method that has to retrieve a certain entity and you don't find it. Can you decide whether to throw an exception, maybe roll back a transaction or just consider it okay? In cases like this it is definitely up to the caller to decide how to handle it.

Solution 6 - Language Agnostic

Both. This is a matter of good software development on both sides and independent of environment (C/S, web, internal API) and language.

The callee should be validating all parameters against the well documented parameter list (you did document it, right?). Depending on the environment and architecture, good error messages or exceptions should be implemented to give clear indication of what is wrong with the parameters.

The caller should be ensuring that only appropriate parameter values are passed in the api call. Any invalid values should be caught as soon as possible and somehow reflected to the user.

As often occurs in life, neither side should just assume that the other guy will do the right thing and ignore the potential problem.

Solution 7 - Language Agnostic

Depends on whether you program nominally, defensively, or totally.

If you program defensively (my personal favourite for most Java methods), you validate input in the method. You throw an exception (or fail in another way) when validation fails.
If you program nominally, you don't validate input (but expect the client to make sure the input is valid). This method is useful when validation would aversely impact performance, because the validation would take a lot of time (like a time-consuming search).
If you program totally (my personal favourite for most Objective-C methods), you validate input in the method, but you change invalid input into valid input (like by snapping values to the nearest valid value).

In most cases you would program defensively (fail-fast) or totally (fail-safe). Nominal programming is risky IMO and should be avoided when expecting input from an external source.

Of course, don't forget to document everything (especially when programming nominally).

Solution 8 - Language Agnostic

I'm going to take a different perspective on the question. Working inside a contained application, both caller and callee are in the same code. Then any validation that is required by the contract of the callee should be done by the callee.

So you've written a function and your contract says, "Does not accept NULL values." you should check that NULL values have not been sent and raise an error. This ensures that your code is correct, and if someone else's code is doing something it shouldn't they'll know about it sooner.

Furthermore, if you assume that other code will call your method correctly, and they don't, it will make tracking the source of potential bugs more difficult.

This is essential for "Fail Early, Fail Often" where the idea is to raise an error condition as soon as a problem is detected.

Solution 9 - Language Agnostic

It is callee responsibility to validate data. This is because only callee knows what is valid. Also this is a good security practice.

Solution 10 - Language Agnostic

It needs to be on both end in client side and server(callee and caller) side too.

Client :

This is most effective one.
Client validation will Reduce one request to server.
To reduce the bandwidth traffic.
Time comsuming (if it has delay responase from server)

Server :

Not to believe on UI data (due to hackers).
Mostly backend code will be reused, so we dont know whether the data will be null,etc,. so we need to validate on both callee and caler methods.

Overall,

If data comes from UI, Its always better to validate in UI layer and make an double check in server layer.
If data transfer with in server layer itself, we need to validate on callee and for double check, we requre to do on caller side also.

Thanks

Solution 11 - Language Agnostic

In my humble opinion, and in a few more words explaining why, it is the callee's responsibility most of the time, but that doesn't mean the caller is always scot-free.

The reason why is that the callee is in the best position to know what it needs to do its work, because it's the one doing the work. It's thus good encapsulation for the object or method to be self-validating. If the callee can do no work on a null pointer, that's an invalid argument and should be thrown back out as such. If there are arguments out of range, that's easy to guard against as well.

However, "ignorance of the law is no defense". It's not a good pattern for a caller to simply shove everything it's given into its helper function and let the callee sort it out. The caller adds no value when it does this, for one thing, especially if what the caller shoves into the callee is data it was itself given by its own caller, meaning this layer of the call stack is likely redundant. It also makes both the caller's and callee's code very complex, as both sides "defend" against unwanted behavior by the other (the callee trying to salvage something workable and testing everything, and the caller wrapping the call in try-catch statements that attempt to correct the call).

The caller should therefore validate what it can know about the requirements for passed data. This is especially true when there is a time overhead inherent in making the call, such as when invoking a service proxy. If you have to wait a significant portion of a second to find out your parameters are wrong, when it would take a few ticks to do the same client-side, the advantage is obvious. The callee's guard clauses are exactly that; the last line of defense and graceful failure before something ugly gets thrown out of the actual work routine.

Solution 12 - Language Agnostic

There should be something between caller and callee that is called a contract. The callee ensures that it does the right thing if the input data is in specified values. He still should check if the incomming data is right according to those specifications. In Java you could throw an InvalidArgumentException.

The caller should also work within the contract specifications. If he should check the data he hands over depends on the case. Ideally you should program the caller in a way that checking is unescessary because you are sure of the validity your data. If it is e.g. user input you cannot be sure that it is valid. In this case you should check it. If you don't check it you at least have to handle the exceptions and react accordingly.

Solution 13 - Language Agnostic

The callee has the responsibility of checking that the data it receives is valid. Failure to perform this task will almost certainly result in unreliable software and exposes you to potential security issues.

Having said that if you have control of the client (caller) code then you should also perform at least some validation there as well since it will result in a better over all experience.

As a general rule try to catch problems with data as early as possible, it results in far less trouble further down the line.

Content Type	Original Author	Original Content on Stackoverflow
Question	hiway	View Question on Stackoverflow
Solution 1 - Language Agnostic	duffymo	View Answer on Stackoverflow
Solution 2 - Language Agnostic	Thihara	View Answer on Stackoverflow
Solution 3 - Language Agnostic	JustinC	View Answer on Stackoverflow
Solution 4 - Language Agnostic	Piotr Gwiazda	View Answer on Stackoverflow
Solution 5 - Language Agnostic	Marco Forberg	View Answer on Stackoverflow
Solution 6 - Language Agnostic	cdkMoose	View Answer on Stackoverflow
Solution 7 - Language Agnostic	Constantino Tsarouhas	View Answer on Stackoverflow
Solution 8 - Language Agnostic	CLo	View Answer on Stackoverflow
Solution 9 - Language Agnostic	Grzegorz Żur	View Answer on Stackoverflow
Solution 10 - Language Agnostic	Hariharan	View Answer on Stackoverflow
Solution 11 - Language Agnostic	KeithS	View Answer on Stackoverflow
Solution 12 - Language Agnostic	André Stannek	View Answer on Stackoverflow
Solution 13 - Language Agnostic	wobblycogs	View Answer on Stackoverflow