Validate & Verify

1. Introduction

A common problem with computer systems is that it is very easy to put incorrect data into them. For example :

If you put incorrect data into a computer system then you will get incorrect results out of it. Processing incorrect inputs will produce incorrect outputs. This leads to the acronym :

GIGO : Garbage In Garbage Out

Sometimes incorrect data can actually cause a computer system to stop working temporarily. This is a particular problem in batch processing systems when data may be processed overnight. If incorrect data stops a batch processing system from working then a whole night's processing time may be lost.

People who develop computer systems go to a lot of trouble to make it difficult for incorrect data to be entered. The two main techniques used for this purpose are :

Validation A validation check is an automatic check made by a computer to ensure that any data entered into the computer is sensible.
Verification A verification check ensures that data is correctly transferred into a computer from the medium that it was originally stored on.

2. Validations

A validation check is an automatic check made by computer software to ensure that any data entered into the computer is sensible. There are many different methods of validation. The most appropriate method(s) to use will depend upon what data is being entered. The most common methods are listed here.

Validation Checks
Presence Check Checks that data has been entered into a field and that it has not been left blank. e.g. check that a surname is always entered into each record in a database of addresses.
Type Check Checks that an entered value is of a particular type. e.g. check that age is numeric.
Length Check Checks than an entered value e.g. surname is no longer than a set number of characters.
Range Check Checks that an entered value falls within a particular range. For example the age of a person should be in the range 0 to 130 years.
Format Check Checks that an entered value has a particular format or pattern. e.g. a new-style car registration number should consist of a letter followed by 1 to 3 numbers followed by 3 letters.
Table Lookup Check Checks that an entered value is one of a pre-defined list of valid entries which should be allowed.
Check Digit A check digit is a digit attached to the end of a string of digits. It is calculated from the other digits and used to help ensure that the whole string is inputted correctly.
Parity Check Used in data communications to ensure that data is not corrupted when it is sent down a transmission medium such as a telephone line.
Batch/Hash Totals Batch or hash totals are used when data is entered into the computer in batches to ensure that the data is entered accurately and no data is missed out.

Validation checks can be performed by any piece of software. However you are most likely to encounter them when creating a new database. Sophisticated database packages will let you implement validation checks using validation rules. You can provide different validation rules for each different field in the database.

3. Presence Check

A presence check is the simplest method of validation. A presence check can be used on any field in a database and simply checks that some data has been entered into the field, i.e. that the field has not been left blank

Presence checks are used on important fields which must have data entered into them. For example every student in a school must be a member of a form. Therefore a presence check could be carried out on the form field in each student's record to ensure that each student is placed in a form. The check would not ensure that each student was in the correct form.

4. Type Check

A type check is a validation check which most databases perform automatically on all entered data. When a database is created each field in the database is given a type. Whenever data is entered into a field the database will check that it is of the correct type, e.g. alphabetic or numeric. If it is not then an error message will be displayed and the data will have to be re-entered. Here are some example field names and appropriate types.

Field Name Type Valid Data Invalid Data
Date of BirthDate11/03/9630/02/76, fred
SexAlphabeticMale, Female, Albert123, WA2
Shoe SizeNumeric12, 2.3, 123236G, house
PostcodeAlphanumericW12 6BD 

Notice that a type check is not a very good validation check. Many of the entries in the Valid Data column in the table pass the type check but are clearly incorrect.

5. Length Check

As with type checks, most databases will automatically perform length checks on any entered data. The length check is a validation check which ensures that the data entered is no longer than a specified maximum number of characters. This is particularly important if a fixed length field is being used to store the data. If this is the case then any extra characters typed that made the data longer than the space available to store it would be lost. Here are some example field names and appropriate maximum lengths :

Field Name Maximum Length Valid Data Invalid Data
Title6mr, Mrs, GeorgeThe Duke Of, Sixteen
Surname15Smith, JonesSmethurst-Whately
County15England, CarThe Former Yugoslav Republic of Macedonia

Length checks are usually only performed on alphabetic or alphanumeric data.

6. Range Check

Range checks are validation checks which are used on data made up of numbers or dates which must fall into a particular range. A lower and upper boundary for sensible values is specified. Any values which fall outside of this range will be rejected. Most sophisticated databases will let you set valid ranges for each field.

Field Name Lower Boundary Upper Boundary
Age0130
Car Engine Size (L)0.58.0
Month112
Temperature in UK (C)-2040

Sometimes there is only one boundary required for a particular field. For example the minimum volume of a cube would be zero cubic centimetres, but there is no maximum volume. When there is only one boundary to check the type of check used is known as a limit check rather than a range check.

7. Format Check

A format check is a validation check which ensures that entered data is in a particular format or pattern. The format that data must be in is specified using an input mask. The input mask is made up of special characters which indicate what characters may be typed where.

In a particular database the following special characters can be used to define an input mask :

Here are some input masks that could be used to validate three letter codes, car registration numbers and postcodes.

Input Mask Purpose Valid Data Invalid Data
LLLThree Letter CodeABC
AND
OLD
AB
B2H
ABCD
LL00LLLCar Registration NumberAB01CDE
DH53MAN
AB1CDE
HELLO
A823HFA
Ll90 0LLPostcodeWA14 9JD
M90 4SJ
BL9 0HN
WAM4 9PM
WA6 13H
M12 9Q

8. Table Lookup Check

Sometimes when you enter a data item into an information system the item should only be one of a list of possible items. For example :

Gender When entering the gender of a person into a database the only two valid entries could be Male and Female.
Membership Number When entering a membership number into a database the membership number must be a number that actually exists and belongs to a member.

When this is the case a table lookup check can be used. A table lookup check takes the entered data item and compares it to a list of valid entries that are stored in a database table. If the entry is in the list of valid entries then it is allowed. Otherwise it is rejected.

Table lookup checks are also known as membership list checks or file lookup checks.

9. Check Digit

The check digit is a particularly important method of validation. It is used to ensure that code numbers that are originally produced by a computer are re-entered into another computer correctly. The check digit is a single digit added onto the end of a code number by the computer. The check digit is calculated from the other digits in the number. Check digits are included in bar code numbers.

Producing a Check Digit

This procedure is used to generate a check digit to add to the end of a number. It uses the Modulo-11 weighted check digit calculation. This calculation is used for ISBN numbers on books.

1) Start with the original product number e.g. 185813415.
2) Weight each digit by its position in the string and add up the results. Note that the lowest weight used is 2 :

Digit185813415 
Weightings*10*9*8*7*6*5*4*3*2Total
Result1072405661516310228
 
3) Divide the total by 11 and then subtract the remainder from 11. The check digit is the result of this operation :

228 / 11 = 20 remainder 8 =>

If the remainder is 10 then the check digit is set to X. If it is 0 then the check digit is 0

As the remainder is neither 10 nor 0, the check digit is calculated like this :

Check digit is 11-remainder = 11-8 = 3.

4)Add the check digit to the end of the original number to get the complete product number : 1858134143

Validating a Number Including a Check Digit

The procedure to check if a number with a check digit in it has been inputted correctly is similar to that used to generated the check digit :

1)Input the number including the check digit. e.g. 1858134153.
2)Weight each digit by its position in the string and add up the results :

Digit1858134153 
Weightings*10*9*8*7*6*5*4*3*2*1Total
Result10724056615163103231
 If the last digit in the number is an X then it is treated as the number 10.
3)Divide the total by 11.

231 / 11 = 21 remainder 0

4)If the reminder is 0 then the number has passed the validation check and so it is likely that it has been inputted correctly.

It is important that each digit is weighted before the numbers are added up. If this was not done then a check digit would not detect transposition errors (where two digits are swapped around). This is a particularly common form of error when numbers are typed.

10. Parity Check

Parity checks are used during transmission of data to detect errors that have been caused by interference or noise. All data is transmitted as a sequence of 1s and 0s. A common type of error that occurs during data transmission is that a bit is swapped from a 0 to a 1 or a 1 to a 0 by electrical interference. Parity checks detect this type of error. A parity check works like this :

Transmission

1)When data is transmitted each character is encoded as a 7-bit binary number. e.g. the letter ‘B’ has the code 1000011.
2)An eighth bit is added to make a byte. This bit is called a parity bit.
3)

A system can use either even or odd parity -

  • Even Parity : The parity bit is set to make sure there are an even number of 1s and 0s in the byte.
  • Odd Parity : The parity bit is set to make sure there are an odd number of ones and zeros in the byte.

For example in an even parity system a parity bit of 1 would be added to the code for B and it would be transmitted as 11000011.

Reception

1)When a character is received the number of 1s and 0s in the byte are counted :

  • In an even parity system the receiver checks that each received byte contains an even number of 1s and 0s.
  • In an odd parity system the receiver checks that each received byte contains an odd number of 1s and 0s.
2)If this is not the case then an error must have occurred. A request will be sent to the transmitter to ask it to send the byte again.

Parity checks are not very good at detecting burst errors where more than one bit in a byte is changed.

Note : Some examination boards treat a parity check as a verfication check rather than a validation check as the parity check ensures that data is correctly transferred from one location to another.

11. Batch and Hash Totals

Some information systems process data that is entered into them in batches from documents. These systems typically use batch processing. Standard validation checks such as range and format checks are used to identify typing errors during data entry, but when data is entered in batches there are two extra types of error that may occur :

Batch and hash totals are two special validation checks that are used when data is entered in batches to identify if one of these types of error occurs.

Batch Total

Before the information on the documents is entered the user counts how many documents there are. This is the batch total.

As the data is entered the computer counts how many documents the data is typed from. After the data has been entered the total that was manually calculated is compared to the computer generated total. If the two differ then a document has been missed or some data has been entered twice.

Hash Total

Hash totals are more sophisticated than batch totals. They can be used to spot missed out or doubly entered documents and can also spot more complex errors such as one document being entered twice at the same time as another is missed out. A particular item on the batch of documents that must be entered is chosen. e.g. Hours Worked. The values of this item for each document that is to be input will be added up manually by the user. The total that is calculated is known as the hash total.

The data on the documents will then be entered. The computer will perform the same totalisation automatically on the chosen item. If the computer calculated total and the manually calculated total differ then a mistake has occurred during data entry.

If a batch or hash total identifies an error then the user will have to go back through all of the documents, checking which have been entered and which have not.

Example

These three documents recording the number of hours worked by the employees at a firm are to be entered in a batch (note that in a real system the number of documents would be larger than this) :

The chosen totals would be calculated manually by the user before the data was entered. Probably only one total would be used. Now suppose that the data was entered but by mistake the user missed out the middle document.

All of the different methods would spot the mistake as the user calculated and computer calculated totals differ. Suppose instead that the user missed out the middle document but entered the last document twice. The computer calculated totals would then be :

The batch total would not spot the mistake as the number of documents entered is correct (3). The hash totals would however spot it as they differ from the user calculated ones.

12. Verification

A verification check ensures that data is correctly transferred into a computer from the medium that it was originally stored on.

Verification checks are usually used to check that information written on a data collection form has been correctly typed into a computer by a data entry worker. Verification checks do not ensure that the entered data is correct. If the original form was completed incorrectly then the entered data will pass a verification check despite being incorrect.

The two most common methods of verification are :

Verification Checks
On Screen Entered data is displayed and the user is prompted to confirm that it matches the data on the input form.
Dual Input Data is entered twice and the two data sets compared. If there are any differences then the data is checked by the user and corrected.

13. On-Screen

One verification method is to use on screen prompts. After a user has entered some data it is redisplayed on the screen. The user is prompted to read the data and confirm that it has been entered correctly. If the user has entered any data incorrectly she should respond that the data is inaccurate and retype the incorrect parts.

This method of verification is not very reliable. Many users will not read the information that is redisplayed or check it carefully against the source document. However the method is relatively cheap to use.

14. Dual Input

The dual input method of verification is used when data is entered at the keyboard. The data to be entered is typed in twice by two different operators. The two copies of the data are then compared. Any differences are detected. The operators will be prompted to retype the sections that differ until both copies agree. When the two copies agree it is assumed by the computer that the data has been entered correctly.

This method of verification is very reliable. It is highly unlikely that any data can be incorrectly transferred into a computer system when the dual input system is used. Unfortunately it is very expensive to use. If data has to be entered twice then it will take twice as long to enter the data or twice as many people will have to be employed.

GCSE ICT Companion 04 - (C) P Meakin 2004