What is the variable that is measured based on the effect of outcome variable?

To understand the concept of independent and dependent variables, one should understand the meaning of variables. Variables are defined as the properties or kinds of characteristics of certain events or objects.

Independent variables are variables that are manipulated or are changed by researchers and whose effects are measured and compared. The other name for independent variables is Predictor[s]. The independent variables are called as such because independent variables predict or forecast the values of the dependent variable in the model.

FIGURE 4.7. Seven categories of stylus strokes or taps used as dependent variables in the evaluation of a handwriting recognition system with predictive aides

[from MacKenzie et al., 2006].Copyright © 2006

Read-text events, button errors, tilt errors, preparation time and scripting time, keystroke savings, and so on are examples of dependent variables devised by researchers to gain insight into performance nuances of an entry technique. As one final comment, if a dependent variable is a count of the occurrence of an observable event, then it is useful to normalize it by converting the count into a ratio, for example, the number of such events per character of input. This facilitates comparisons across conditions and potentially across studies. If, for example, we are interested in participants' error-correcting behavior and decide to use presses of the BACKSPACE key as a dependent variable for such, then it is more instructive to analyze and report on “BACKSPACE key presses per character” than the raw counts of presses of the BACKSPACE key.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780123735911500048

Designing HCI Experiments

I. Scott MacKenzie, in Human-computer Interaction, 2013

5.5 Dependent variables

A dependent variable is a measured human behavior. In HCI the most common dependent variables relate to speed and accuracy, with speed often reported in its reciprocal form, time—task completion time. Accuracy is often reported as the percentage of trials or other actions performed correctly or incorrectly. In the latter case, accuracy is called errors or error rate. The dependent in dependent variable refers to the variable being dependent on the human. The measurements depend on what the participant does. If the dependent variable is, for example, task completion time, then clearly the measurements are highly dependent on the participant’s behavior.

Besides speed and accuracy, a myriad of other dependent variables are used in HCI experiments. Others include preparation time, action time, throughput, gaze shifts, mouse-to-keyboard hand transitions, presses of backspace, target re-entries, retries, key actions, gaze shifts, wobduls, etc. The possibilities are limitless.

Now, if you are wondering about “wobduls,” then you’re probably following the discussion. So what is a wobdul? Well, nothing, really. It’s just a made-up word. It is mentioned only to highlight something important in dependent variables: Any observable, measurable aspect of human behavior is a potential dependent variable. Provided the behavior has the ability to differentiate performance between two test conditions in a way that might shed light on the strengths or weaknesses of one condition over another, then it is a legitimate dependent variable. So when it comes to dependent variables, it is acceptable to “roll your own.” Of course, it is essential to clearly define all dependent variables to ensure the research can be replicated.

An example of a novel dependent variable is “negative facial expressions” defined by Duh et al. [2008] in a comparative evaluation of three mobile phones used for gaming. Participants were videotaped playing games on different mobile phones. A post-test analysis of the videotape was performed to count negative facial expressions such as frowns, confusion, frustration, and head shakes. The counts were entered in an analysis of variance to determine whether participants had different degrees of difficulty with any of the interfaces.

Another example is “read text events.” In pilot testing a system using an eye tracker for text entry [eye typing], it was observed that users frequently shifted their point of gaze from the on-screen keyboard to the typed text to monitor their progress [Majaranta et al., 2006]. Furthermore, there was a sense that this behavior was particularly prominent for one of the test conditions. Thus RTE [read text events] was defined and used as a dependent variable. The same research also used “re-focus events” [RFE] as a dependent variable. RFE was defined as the number of times a participant refocuses on a key to select it.

Unless one is investigating mobile phone gaming or eye typing, it is unlikely negative facial expressions, read text events, or refocus events are used as dependent variables. They are mentioned only to emphasize the merit in defining, measuring, and analyzing any human behavior that might expose differences in the interfaces or interaction techniques under investigation.

As with independent variables, it is often helpful to name the variable separately from its units. For example, in a text entry experiment there is likely a dependent variable called text entry speed with units “words per minute.” Experiments on computer pointing devices often use a Fitts’ law paradigm for testing. There is typically a dependent variable named throughput with units “bits per second.” The most common dependent variable is task completion time with units “seconds” or “milliseconds.” If the measurement is a simple count of events, there is no unit per se.

When contriving a dependent variable, it is important to consider how the measurements are gathered and the data collected, organized, and stored. The most efficient method is to design the experimental software to gather the measurements based on time stamps, key presses, or other interactions detectable through software events. The data should be organized and stored in a manner that facilitates follow-up analyses. Figure 5.3 shows an example for a text entry experiment. There are two data files. The first contains timestamps and key presses, while the second summarizes entry of a complete phrase, one line per phrase.

Figure 5.3. Example data files from a text entry experiment: [a] The summary data one [sd1] file contains timestamps and keystroke data. [b] The summary data two [sd2] file contains one line for each phrase of entry.

The data files in Figure 5.3 were created through the software that implements the user interface or interaction technique. Pilot testing is crucial. Often, pilot testing is considered a rough test of the user interface—with modifications added to get the interaction right. And that’s true. But pilot testing is also important to ensure the data collected are correct and available in an appropriate format for follow-on analyses. So pilot test the experiment software and perform preliminary analyses on the data collected. A spreadsheet application is often sufficient for this.

To facilitate follow-up analyses, the data should also include codes to identify the participants and test conditions. Typically, this information is contained in additional columns in the data or in the filenames. For example, the filename for the data in Figure 5.3a is TextInputHuffman-P01-D99-B06-S01.sd1 and identifies the experiment [TextInputHuffman], the participant [P01], the device [D99], the block [B06] and the session [S01]. The suffix is “sd1” for “summary data one.” Note that the sd2 file in Figure 5.3b is comma-delimited to facilitate importing and contains a header line identifying the data in each column below.

If the experiment is conducted using a commercial product, it is often impossible to collect data through custom experimental software. Participants are observed externally, rather than through software. In such cases, data collection is problematic and requires a creative approach. Methods include manual timing by the experimenter, using a log sheet and pencil to record events, or taking photos or screen snaps of the interaction as entry proceeds. A photo is useful, for example, if results are visible on the display at the end of a trial. Videotaping is another option, but follow-up analyses of video data are time consuming. Companies such as Noldus [www.noldus.com] offer complete systems for videotaping interaction and performing post hoc timeline analyses.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780124058651000054

Data Modeling

David Nettleton, in Commercial Data Mining, 2014

Summary of the use of regression techniques

If the dependent variable has more than two categories [that is, it’s not binary], a discriminant analysis can be used to identify the variables that best classify the data. If the dependent variable is continuous [numerical], a linear regression could be used to predict the values of the dependent variable from a set of independent variables.

If the formula to be fitted is known beforehand, and its parameters are non-linear, then a non-linear technique would be the most appropriate to use. If the dependent variable is binary, as is the case for a diagnosis whose result is positive or negative, then a logistic regression model would be used. If the variable is biased, as in “time since last purchase,” appropriate techniques would include “Life Tables,” “Kaplan–Meier,” or a “Cox”-type regression.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780124166028000091

Quantitative Cross-national Research Methods

G. Esping-Andersen, A. Przeworski, in International Encyclopedia of the Social & Behavioral Sciences, 2001

2 Qualitative and Limited Dependent Variables

Many dependent variables in cross-national research are either qualitative [assuming discrete values] or limited [they assume continuous values within some range]. The choice whether to measure a variable in a qualitative or continuous way is often controversial. Bollen and Jackman [1989], for example, argue that difficulties in classifying some political regimes speak in favor of using continuous scales because ‘Dichotomizing democracy blurs distinctions between borderline cases.’ In contrast, Przeworski et al. [in press] prefer to treat regimes dichotomously or multinomially.

When the dependent variable is either qualitative or limited, we need to use nonlinear models. Whether we give political regimes the values of 0–1 [as do Przeworski et al. in press] or 1–100 [as does Bollen 1980], it remains that the value on the dependent variable cannot exceed its maximum when the independent variable[s] tend to infinity, and it cannot fall below its minimum when these variables tend to infinity. Linear models, at best, can provide an approximation within some range of the independent variables.

The standard model when the dependent variable is multinomial is

[1]PrY=j,X=x=Fx

where j=0, 1, … , J−1 and F is the cumulative distribution function. Since such models are covered by any standard textbook we need not present them here. Such models can be applied to panel data unless the number of repeated observations is large.

‘Event history analysis’ is one particular class of nonlinear models applied in cross-national research. In such models the dependent variable is an event, such as a revolution, regime transition, or a policy adoption. The general model is

[2]PrYt+dt=j=fyt,xt,dt0

Most often, such models can be conveniently estimated as

[3]logSt=log1−Ft

where S[t] is the survival function, or the probability that an event lasts beyond time t, and F[t] is the cdf.

For dichotomous dependent variables, logit and probit give very similar results. For multinomial variables, it is often assumed that the errors are independent across the values of the dependent variable, which leads to a logit specification. But this implies a strong assumption, namely the irrelevance of independent alternatives that rarely holds in practice. Multinomial probit, in turn, requires computing multiple integrals, which was until recently computationally expensive. Alternatives would be semi- and nonparametric methods.

The distributions that are commonly used in estimating survival models include the exponential, Weibul, logistic, and Poisson distributions. But, except for the Weibul and exponential, it is very difficult to distinguish statistically between such distributions.

Methods for studying qualitative dependent variables are now standard textbook fare, but what warrants emphasis is that the traditional distinction between ‘qualitative’ and ‘quantitative’ research is becoming increasingly obsolute. Phenomena such as social revolutions may be rare, but this just means they occur with a low probability. They can, nevertheless, be studied systematically using maximum likelihood methods.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B0080430767007543

Experimental research

Jonathan Lazar, ... Harry Hochheiser, in Research Methods in Human Computer Interaction [Second Edition], 2017

2.2.4 Typical Dependent Variables in HCI Research

Dependent variables frequently measured can be categorized into five groups: efficiency, accuracy, subjective satisfaction, ease of learning and retention rate, and physical or cognitive demand.

Efficiency describes how fast a task can be completed. Typical measures include time to complete a task and speed [e.g., words per minute, number of targets selected per minute]

Accuracy describes the states in which the system or the user makes errors. The most frequently used accuracy measure is error rate. Numerous metrics to measure error rate have been proposed for various interaction tasks, such as the “minimum string distance” proposed for text entry tasks [Soukoreff and Mackenzie, 2003]. In HCI studies, efficiency and accuracy are not isolated but are highly related factors. There is usually a trade-off between efficiency and accuracy, meaning that, when the other factors are the same, achieving a higher speed will result in more errors and ensuring fewer errors will lower the speed. Consequently, any investigation that only measures one of the two factors misses a critical side of the picture.

Subjective satisfaction describes the user's perceived satisfaction with the interaction experience. The data is normally collected using Likert scale ratings [e.g., numeric scales from 1 to 5] through questionnaires.

Ease of learning and retention rate describe how quickly and how easily an individual can learn to use a new application or complete a new task and how long they retain the learned skills [Feng et al., 2005]. This category is less studied than the previous three categories but is highly important for the adoption of information technology.

Variables in the fifth category describe the cognitive and physical demand that an application or a task exerts on an individual or how long an individual can interact with an application without significant fatigue. This category of measures is less studied but they play an important role in technology adoption.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128053904000029

Evaluation

Beverly Park Woolf, in Building Intelligent Interactive Tutors, 2009

6.1.4.3 Select Control Measures

Once dependent variables are chosen, control measures, or groups who do not receive the experimental condition, are selected to ensure that dependent variables are measured against learners within alternative conditions. The definition of a control group is problematic, especially if the control group uses no new learning method and undergoes no change in pedagogy, while the treatment group moves to a new room with additional personnel and uses animated modules and so forth. Merely presenting software as new and exciting can sway the results of an evaluation. Control measurement should be principled, based on theoretical approaches to performance. Controls are built in before performing an experiment or afterward statistically [e.g., by adjusting for student mean IQ, per capita income, and ethnicity]. In reality, this is not always possible. Control is sometimes not possible [e.g., in medical school]. Yet uncontrolled conditions result in unanticipated interaction across settings, materials, classroom dynamics, and teacher personalities. Not collecting aptitude difference means evaluators cannot make conclusions about treatment-effects, or aptitude treatments interaction, as one type of person may perform better in one environment than in another.

Evaluations studies should be double blind [neither evaluators nor students know exactly what was being measured]. Evaluators should be people outside of the development/research group, and the design should achieve stratified examples of subjects. Evaluations often use small sampling pools that limit the ability of a sample to be accurately representative of a population. However, the benefits of a large student study [more than 100 students] provide an improved prediction for larger populations.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B978012373594200006X

Chemometrics in Food Chemistry

Frank Westad, ... Federico Marini, in Data Handling in Science and Technology, 2013

7.3 iPLS

When the dependent variables are homogeneous and are defined as a function of a continuous parameter, such as in spectra or chromatograms, due to the high correlation among neighbouring predictors, it would be more meaningful to select groups of variables rather than a single one, at a time. One way of doing so, in the context of latent variable-based methods is to use the so-called interval approach [45]. Since a full chapter of this book is devoted to interval methods [Chapter 12], the reader may refer to the same chapter to find a detailed coverage of the topic.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780444595287000041

Useful observational research

Brian Wansink, in Context, 2019

5.3.1 Developing a coding sheet

To develop a coding sheet, consider the earlier example of trying to determine what slim people do differently at buffets than heavy people. Maybe heavier diners sit closer to the food, face the food [instead of facing away], use forks instead of chopsticks, and maybe they use large plates. Maybe they are also less selective about what they take so they take more different types of foods, and maybe they end up wasting more than other diners.

These were all simply observations that were collected during the storytelling observations. Some might have theoretical explanations [for example, it is more convenient to serve yourself without browsing first, sit closer, or use a fork], but others might involve observations that seem to reoccur with no understandable explanation at the time. Our view is to include this item because it may be that with enough data, a pattern can provide a currently elusive explanation. Consider four types of information that you should code: [1] Demographic and identification data, [2] environmental factors, [3] independent behaviors, and [4] dependent variables.

1. Demographic and Identification data. These include the basic demographic data along with the time, location, and date it is being collected. A key feature of some research involving eating is that it can be useful to have the height and weight of people so that an approximate body mass index can be calculated. Having this data, for example, was important when conducting research on what slim diners do differently at buffets than heavy diners. A researcher or observer can be coached to better estimate heights and weights, or they could also use benchmarks [height marks on buffet lines or pressure sensitive mats]. In addition, visual display cards, such as the Stunkard Visual Figures Scale can be copied onto the back of coding sheets, enabling a researcher to circle the shape of the observed person [see Fig. 5.4].

Fig. 5.4. Stunkard's visual figures scale.

2. Environmental Factors. For example, if it is believed that the books that are in the proximity of a work area will either prime a person to work or distract them not to work, it is important to code their type, their distance, and their visibility. As the examples in both Figs. 5.2 and 5.3 show environmental factors on coding sheets can also include who a person is with and how the others around them are acting.

3. Independent Behaviors. If it is believed that the distance a diner sits from a buffet table, the size of the plate they select, or the direction they face might influence their behavior [and could be related to their overall BMI in the long term], it is important to code these. Independent behaviors are often difficult to determine because they are not always obvious before the fact. Using the storytelling method is one of the best ways to identify these possible independent behaviors. These should be objectively observable “Yes/No” questions whenever possible [did they use chopsticks], or objectively measured [number of feet the person is sitting from the nearest part of the buffet].

4. Dependent variables. When an observation study goes to the stage of being quantitative with a large number of variables, there is generally a specific spotlighted variable that is of interest. It could be how much money a person spends, how long they stay at the location, how much food they waste, whether they make a purchase, whether they order healthy food, and so forth. This is the most critical variable to measure accurately, and it is best if it can be collected in multiple ways. Consider the following multiple ways data can be collected:

Wasting Food: What percent is wasted? What foods are wasted? How many different types are wasted?

Purchase Behavior: Whether or not a purchase is made [[] Purchases; [] Does not Purchase], How much is spent? How many items are purchased?

How long should a coding questionnaire be? Some researchers only include coding variables that can be theoretically justified. This is generally too few because it only allows for confirmation, but for no new discoveries. Other researchers indiscriminately list nearly everything they can observe, but this extreme also has problems. They often do not know how to analyze this in an insightful way, and they might run the risk of false positives—something being significant by chance. There are a number of solutions of how one can minimize this and still generate unexpected insights. If they make theoretical sense and if they are consistent with other behaviors you would expect to see, then the exploratory finding is probably not a false positive. For example, suppose it is shown that people who face the buffet make more trips to the buffet. It may be that they do this because it is more visible [theoretical explanation], and if people who have an unobstructed view of the buffet also make more trips, this would be another consistent behavior that supports this finding.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128144954000052

Diffusion and Random Walk Processes

R. Ratcliff, in International Encyclopedia of the Social & Behavioral Sciences, 2001

1.1 Scaling and Distribution Shape

The two dependent variables, response time and accuracy, have different scale properties. Response time has a minimum value and its variance increases as response time increases. Accuracy is bounded at probabilities 0.5 and 1.0, and as probability correct approaches 1.0, variance decreases. Diffusion models account for these scale properties automatically as a result of the geometry of the diffusion process. When mean drift rate is high, the probability of a correct response is near 1.0 and decision processes approach the correct boundary quickly with little spread in arrival times [e.g., the processes with mean v1 in Fig. 1]. When mean drift rate is nearer zero [processes with mean v2 in Fig. 1], variability leads some processes to hit one boundary, other processes to hit the other boundary, and accuracy nears 0.5; the arrival times at boundaries are long and highly variable. In addition to these scale properties, the geometry of the diffusion process also gives response time distributions skewed to the right; as mean response time increases [e.g., from drift rate v1 to v2], the fastest responses slow a little and the slowest responses slow a lot.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B0080430767006203

Population Model: Calibration and Validation

Donald W. Boyd, in Systems Analysis and Modeling, 2001

3.6.1.1 Selection of Model Variables

Balance equation error for each system [or subsystem] is minimized by including all variables that significantly impact the balance equation. Whether to include a variable can be assessed with the assistance of a domain expert or with a statistical method such as experimental design through analysis of variance [27].

Calibration of Dynamic Forms

Each endogenous dependent variable is defined by a dynamic form composed of those independent variables that significantly impact its value. Regression coefficients must be selected so as to minimize regression error. When all variables that measurably influence model behavior are supported by primary data, standard linear regression techniques are applicable; otherwise, reverse regression techniques apply.

Primary Data

Primary data provide a standard by which model output is validated. However, the processes of measurement [or counting], recording, and compilation of primary data also have potential to introduce error. Error control measures include upgrading collection techniques: less manual processing and more use of automation and advanced technologies. If primary data exist for all variables of a system [subsystem], seldom, if ever, will the balance equation exhibit error-free balance. To compensate, a composite error term can be included in the system [subsystem] balance equation to account for imbalances. The size of the error term provides a relative measure of accuracy for the primary data.

Secondary Data

Estimation errors are introduced when secondary data are substituted for missing primary data. Exact error magnitudes cannot be determined in the absence of historical data. Because secondary data must pass TT tests, the process of structuring and calibrating the model keeps error magnitudes in check. Furthermore, estimation error tends to be distributed over all secondary data, making minimal impact on those variables having relatively great magnitudes.

What type of variable is an outcome measure?

An outcome measure [also known as a dependent variable or a response variable] is any variable recorded during a study [e.g. volume of damaged tissue, number of dead cells, specific molecular marker] to assess the effects of a treatment or experimental intervention.

Which variable is the outcome variable?

The outcome variable is also called the response or dependent variable, and the risk factors and confounders are called the predictors, or explanatory or independent variables. In regression analysis, the dependent variable is denoted "Y" and the independent variables are denoted by "X".

Chủ Đề