A measure is reliable but not valid

These two terms are sometimes used interchangeably in research and evaluations. However, they mean different things. Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method, technique or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.

It’s important to consider reliability and validity when you are creating your research design, planning your methods, and writing up your results, especially in quantitative research.

What is Reliability?

Reliability (or precision) refers to consistency. That is, if you use an instrument or test several times, you should get the same results. If the data (or the instrument) are unreliable, then the data are considered unrelated to the phenomenon or the concept being measured. This, therefore, means that the results cannot be repeated. For example, a broken thermometer that gives a different measurement every time it is placed in the same environment under the same conditions is not reliable.

What is the Validity?

Validity simply means that a test or instrument is accurately measuring what it’s supposed to. In evaluations, we usually refer to two types of validity; internal and external.

  • Internal validity refers to the extent to which an instrument (or an evaluation) correctly answers the questions it claims to answer about what is being tested (or evaluated). For example, a questionnaire (instrument) that asks persons to state the amount of their donations. Is the answer an indication of how charitable people are? Or is it their disposable income that is actually being measured by this instrument?

  • External validity refers to the extent to which the results of an evaluation can be generalised to other situations. That is, the extent to which the sample selection reflects the population. The value of external validity is the ability to generalise the results to a larger population.

A Final Word...

Tests or instruments that are valid are also reliable. E.g. a properly functioning thermometer is valid (and reliable) because it measures the correct temperature in a consistent manner every time. However, tests or instruments can be reliable but not always valid. E.g. , the broken thermometer that is a degree off would be reliable (giving you the same results each time) but not valid (because it was not recording the correct temperature).

We are proud to announce that our company is an ISO 17025 SAC-SINGLAS Accredited Laboratory providing comprehensive vibration and shock testing solutions. Please don’t hesitate to send us inquiry if you need Vibration and Shock Testing

It’s a common mistake to assume that reliability and validity, as they relate to pre-employment tests, are essentially the same thing. 

They aren’t.  And if you’re shopping around for a hiring assessment, it’s important to understand both what both concepts mean, why they’re so important, and how they differ.

A measure is reliable but not valid

What is Reliability? 

Of the two terms, assessment reliability is the simpler concept to explain and understand.  

Here’s a good definition of reliability in a research context: if an assessment is reliable, the results will be very similar no matter when someone takes the test. If the results are inconsistent, the test is not considered reliable.  

So, if you’re focusing on the reliability of a test, the question to ask is: are the results of the test consistent? If someone takes the test today, a week from now, and a month from now, will their results be the same? 

To determine the reliability of their tests, assessment companies pay close attention to two aspects of reliability in particular: re-test reliability and internal consistency measures.  

Find out why science-based hiring assessments are more helpful at identifying candidates’ potential than resumes, referrals, and interviews here.

Test Re-Test Reliability 

To confirm a test’s reliability, assessment companies determine consistency over time with test-retest reliability. With this type, the same group of people is given the test twice (a few days or weeks apart) in order to spot differences in results.

Researchers then measure the correlation coefficient—a statistical measure ranging on a scale from 0, no correlation, to 1, perfect correlation, to assess the reliability of the test. Since no test is going to be completely error-free, the correlation needs to be 0.7 or higher to be considered reliable.  

Internal Consistency 

Internal consistency focuses elsewhere to confirm that, yes, test items that are intended to be related are truly related.  

Assessment companies typically measure internal consistency by correlating scores on the first half of the test to those on the second half. Since these scores should be measuring the same thing, the correlation should be 0.7 or higher. For example, if part of a pre-employment assessment is designed to measure math skills, test-takers should score equally as well on the first and second halves of that part of the test.

When deciding between assessments, ask your vendors whether or not their assessment has been validated for pre-employment testing and screening, as a test is not valid in every situation. For example, a personality test might be valid in a clinical setting, but if scores aren’t related to job performance, it’s not valid as a pre-employment assessment.

A measure is reliable but not valid

What is Validity? 

A validity definition is a bit more complex because it’s more difficult to assess than reliability.

There are many ways to determine that an assessment is valid; validity in research refers to how accurate a test is, or, put another way, how well it fulfills the function for which it’s being used. In pre-employment assessments, this means predicting the performance of employees or identifying top talent. 

There are several ways for assessment companies to measure types of validity within tests, including content, criterion-related, and construct validity. 

Content Validity 

An assessment is said to have content validity when the criteria it’s measuring aligns with and adequately covers the content of the job. Also, the extent to which that content corresponds with success on the job is part of the process in determining how well the assessment demonstrates content validity.  

Here’s an example: a fast typing speed would likely be considered a key part of the job for an executive secretary, but not for an executive. While the executive is probably required to type sometimes, this skill is not as nearly as important to performing that job as it would be for the executive secretary. Ensuring that an assessment demonstrates content validity means judging the degree to which test items and job content match each other. 

An assessment demonstrates criterion-related validity if the results of the assessment are predictive of a function that’s related to job performance.  

So how can we tell if an assessment predicts performance? Assessment scores must be statistically evaluated against a measure of employee performance. For example, an employer interested in understanding how well a personality test identifies individuals that are likely to engage in counterproductive work behaviors might compare applicants’ personality test scores to how many accidents or injuries those individuals have on the job, if they engage in on-the-job drug use, or how many times they ignore company policies.  

The degree to which the assessment results are related to a measure of performance—like counterproductive work behaviors—is the extent to which it exhibits criterion-related validity. 

Construct Validity 

An assessment demonstrates construct validity if it is related to other assessments measuring the same psychological construct—a construct being a concept used to explain behavior. For example, cognitive ability is a construct that’s used to explain a person’s capacity to understand and solve problems.  

To measure construct validity, an assessment company would statistically compare an assessment to similar tests that, in theory, it should be related to since they are measuring the same thing. There really shouldn’t be a significant relationship between a test that is measuring personality and one that is measuring cognitive ability, because they’re measuring two different constructs. However, the test that is measuring personality should be strongly correlated with other tests measuring personality.

A measure is reliable but not valid

Can a Test Be Valid but Not Reliable? 

As you’d expect, a test cannot be valid unless it’s reliable. However, a test can be reliable without being valid.

Let’s unpack this, as it’s common to mix these ideas up. 

If you’re providing a personality test and get the same results from potential hires after testing them twice, you’ve got yourself a reliable test. However, if the personality test isn’t actually measuring the personality traits it claims to, and instead corresponds with an unrelated assessment such as on-the-job skills, this assessment probably isn’t valid. 

Tips to Ensure Your Test is Reliable and Valid 

To make sure the pre-employment assessment you choose is both reliable and valid, check that the vendor focused on creating a valid test in the earliest phases of development.  

Your use of the assessment should always be tied back to a tangible job outcome, objective problem, or measurable personality trait. The industrial-organizational scientists involved with the product should have conducted thorough research and consulted subject matter experts in your field to review test questions and ensure they’re designed for what they’re intended to measure. Additionally, the sample population used for test development should be appropriately representative of the population as a whole that may use the assessment. (For example, you wouldn’t want to test homogenous populations or a small sample.) 

Next, you can check guardrails for reliability that your potential vendors have put in place by asking: 

  • “Does the assessment use clear, easy-to-understand language with a variety of questions to measure each category?”  
  • “Did the industrial-organizational researchers review the test items for bias?”  
  • “What was the sample population used for developing and validating the test?” 

The assessment you choose should also come with detailed instructions that decrease any variations in testing conditions as much as possible, from time given for test-taking to noise levels in the testing environment. 

In understanding the nuances of reliability vs validity, you’ll see that both distinct concepts are necessary for the success of every test you use. From cognitive ability tests to personality tests to emotional intelligence tests, any pre-employment assessment needs to measure what it intends to measure and produce consistent results over time to be useful to you and your company.  

Developed with insights from I-O Psychology, Artificial Intelligence, and machine learning, Wonderlic’s one-of-a-kind WonScore assessment provides reliable and valid testing for your organization, so you can be confident that each hire is a perfect fit for the role. Schedule your free demo today to learn how Wonderlic helps industry leaders like you hire the best talent.

Can a measure be reliable but not valid example?

A measurement maybe valid but not reliable, or reliable but not valid. Suppose your bathroomscale was reset to read 10 pound lighter. The weight it reads will be reliable(the same every time you step on it) but will not be valid, since it is notreading your actual weight.

Is a measure reliable or valid?

Reliability and validity are both about how well a method measures something: Reliability refers to the consistency of a measure (whether the results can be reproduced under the same conditions). Validity refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure).

Can a measure be reliable but not valid quizlet?

A measure can be reliable, but not valid. However, a measure cannot be valid unless it is reliable. *Reliability is a necessary but not sufficient condition for validity. For example, you can reliably measure eye color, however, it may not be related to job performance at all.

What does it mean if a questionnaire is reliable but not valid?

A measure can be reliable but not valid, if it is measuring something very consistently but is consistently measuring the wrong construct.