Skip to content

The development of discrimination in a simple locomotor habit

The present experiment is concerned with the course of acquisition of a simple discrimination habit in the white rat, and with the relationship of that course to the phenomena resulting from the simple reinforcement and non-reinforcement of responses to differing stimuli. Recent emphasis, theoretical and experimental, on the acquisition of discriminations in the choice-reaction situation, the ‘instrumental’ or ‘operant’ conditioned response, and the conditioned reflex has made a trial-by-trial description of the process desirable. The data obtained clearly indicate that the principles first observed in studies of the conditioned reflex are paralleled to a striking extent in the more complex behavior obtained by means of a quite different set of operations.

A number of experiments (from the Brown laboratory) have been concerned with that form of behavior which has been variously termed ‘operant conditioning,’ ‘trial and error,’ ‘problem solving,’ and ‘Thorndike’s rewarded learning.’ The response which has been investigated is that of running down a small runway to a food box when the door of a starting box has been opened. The latent period has been chosen as a measure of the strength of the response. This is defined as that interval, measured to the nearest fifth of a second, between the opening of the door and passage across a mark four inches beyond the door of all except the animal’s tail. The course of acquisition, extinction, and spontaneous recovery of the response has been studied (6). Data on the effect of spacing of trials (4), on retention after primary acquisition and extinction (5), and on external inhibition and disinhibition have been collected (3). The effect of random reinforcement on subsequent extinction has been determined (2), and in the experiment here reported it is the course of acquisition of discrimination which has been investigated.

It is possible to analyze any extended sample of behavior into a series of correlations of stimulus and response, termed ‘reflexes,’ or ‘S-R bonds.’ When such analysis is made, it is found that such reflexes are chained, that is to say, the response to one stimulus may serve to produce the stimulus to the next response. This linkage may occur in either of two ways: (1) the response may set up internal stimulation which elicits the next response, or (2) the response may so move the organism with respect to the environment that new external stimuli may act upon the organism’s receptors, thus eliciting the next response.

Any such chain of stimulus-response correlations can be reinforced, and the strength of each member consequently augmented. But it has been found that all members of a chain are not affected alike by reinforcement at the termination of the chain. Most prominent of the differences which may occur is this: The more remote from reinforcement the occurrence of a correlation, the less the augmentation of its strength. This effect has been most extensively treated by Hull (11) in his work in the goal gradient. Skinner (14) has also discussed the problem.

The behavior studied in the present series of investigations constitutes such a chain of correlations, and the member of the chain whose strength was studied may be denoted as the response of running out on the alley, which is elicited by the change produced by the opening of the door in front of the starting box. In an extensive series of preliminary and purely exploratory experiments carried out in the academic year 1939-1940, it was found that a later member of the chain displayed many interesting properties, and that a second measure was necessary for an adequate description of the development of discrimination. The stimulus of this correlation is specified as the environmental and internal energy-changes which affect the organism as it runs down the alley to the food box, and the response is full entrance into the food box. The measure which has been made is termed running time, and is defined as that interval, in seconds, between the termination of the latent period, and the time at which the animal has entered the food box to such an extent that when the door of the food box is closed, only the tail of the animal remains outside. This measure is taken to represent the latent period of the second reflex.

When these considerations have been taken, it will be evident that when experimentally introduced discriminative stimuli are present at the opening of the door and through the remainder of the apparatus, the behavior displayed consists of two chains of correlations. And furthermore, in measuring the latent period and the running time the strengths of two separate stimulus-response correlations in each chain are determined. Diagrammatically, we may present the situation as follows:

Chain A:

S———- R- – – – – – S———– R- – – – – – S———– R,
door opening
D alley
running out D alley and
food box
entrance in
food box
food eating

Chain B:

S———- R- – – – – – S———– R- – – – – – S———– R,
door opening
D alley
running out D alley and
food box
entrance in
food box
food eating

where D and D are the discriminative stimuli.2

The present problem, then, may be stated as follows: to determine the course of changes in strength of two members of each of two chained correlations of stimulus and response as a function of differential reinforcement of the chains.

METHOD

Apparatus

The runway described by Graham and Gagné (6) was employed, with certain modifications. A longitudinal diagram is presented in Fig. 1.

This runway has the dimensions 3 feet by 2 inches by 3/4 inches. One end is attached by a swivel arrangement to a point just below the center of the sill of the door to the food box; the other end, loose, is bevelled to fit into a notch cut into the base of the starting door. This loose end is firmly held in its groove by a rubber roller, under considerable tension from a strong spring. A pulley and rope arrangement attached to the holder of the roller enables the experimenter to release the runway and to rotate it about its longitudinal axis. Of the two running surfaces of the runway, one is painted black, one white; the sides are gray.

The gray permanent starting box has the outside dimensions 5 1/2 inches wide, 7 1/4 inches long, and 4 3/4 inches high. It is open at both ends, and the top is covered with a wire screen. One open end fits against the frame of the starting door; the other may be closed by a door hung from below on a stiff hinge. Three food boxes were utilized, one white, one black, and one gray. (The latter was used in preliminary training.) Aside from brightness, they are identical. The outside dimensions of these boxes are 5 1/2 inches wide, 7 1/4 inches long, and 5 inches high. The top is screened, and one end is open. The open end fits against the frame of the door through which the animal passes. On the closed end is the food receptacle which is a metal shelf 1 1/2 inches wide, 1 inch deep, and 5/8 inches above the floor. The front of this shelf is screened by a piece of metal 1 1/2 inches wide and 1 inch high. A hole cut in the back of the box permits the loading of this shelf, and examination of its contents. A handle is attached to the box above the hole.

The doorways, 5 1/2 inches by 5 1/2 inches, are cut in the center of the lower face of large screens at each end of the runway. These screens are permanently built on the box holders and door frames, and are painted gray. The doors themselves are likewise gray, and open to a height of 5 inches. They can be quietly and rapidly raised or lowered by a pulley, string, and counterweight arrangement manipulated by the experimenter.

The whole apparatus is enclosed in a large wooden framework, covered by cheesecloth, which forms an effective one-way screen. The only illumination is provided by a 25-watt bulb about one foot in front of the screen at the food box end of the runway, and 20 inches above the runway.

The experimental room is in a part of the building which is quiet at night, when the experiments were conducted.


Accessory apparatus included a nine-foot section of elevated runway of the same width as the experimental alley. This is painted black and white in different sections and was used in preliminary training. Two platforms, approximately one foot by one foot were employed, one for feeding after a day’s experimental session, and one for those periods when the animal waited to be used or to be returned to the living cage.

Procedure
1. Preliminary Training
Several days prior to the beginning of preliminary training, each subject was taken off the usual unrestricted food regime employed in the rat colony, and was placed on a feeding schedule. On this schedule, each rat was permitted to eat for one-half hour every day the small cylindrical pellets of Purina dog chow employed in the experiment.

On the first day of preliminary training, the rat was taken to the experimental room and placed on the practice runway, where he was permitted to run for 10 minutes without reinforcement. He was then placed on the waiting stand for at least one minute, after which he was held in the practice food box for five minutes without reinforcement. The order in which the last two parts of the training occurred was often inverted. Finally, after another delay, the animal was placed on the food stand and allowed to eat and drink for twenty minutes. On the third habituation day, the five minutes’ handling was dropped, and approximately twenty small pieces of food were given in the food box. After the animal had eaten these, and after a short stop on the waiting stand, he was placed on the food stand and allowed to remain there until he had spent a total of twenty minutes eating. The regime of feeding twenty minutes a day, including the time of reinforcement, was maintained throughout the course of the experiment. It is believed that this procedure of setting up and maintaining a feeding schedule held the subjects’ drive at a relatively constant level throughout the experiment. The rats employed remained healthy through the experimental period although some loss of weight was observable occasionally.

2. Initial Acquisition of the Running Behavior to Black Alley and Food Box
On the first day of the experiment proper, eight reinforced trials were given on the black runway with black food box. The procedure employed was as follows. Two minutes before the start of the trial, the animal was placed in the starting box. The food box was baited with a quantity of food which required approximately one and one-half minutes to eat (ca. 0.35 gm.). This amount varied slightly from animal to animal. At the beginning of the trial a stopwatch was started as the starting door was opened. At the moment when all the animal’s body except the tail had crossed a point three inches from the starting box, the watch was stopped, and another started. As soon as the animal had traversed the runway and entered the food box according to the criterion of entrance (vide supra), the door was closed and, at the same time, the door to the starting box was closed. At one minute and twenty-five seconds, uneaten food was taken from the food box, and the box was carried to the other end of the apparatus. The experimenter now transferred the animal to the starting box. The food box was returned to its original position and reloaded. About two seconds before the end of the two-minute interval, the stopwatch was stopped, reset, and the next trial was begun.

3. Discrimination Training
During the main course of the experiment, every fourth trial was reinforced. Thus, training was divided into what may be termed cycles of three non-reinforced trials followed by one reinforced trial. The trial procedure in this section of the experiment was the same as in initial acquisition of the black habit, with the following exceptions: (1) Immediately before and after a reinforced trial, the alley was turned over, and the food box was changed after the animal had been transferred to the starting box (except in the case of Group III). (2) Food was placed in the food box only before positive trials. Before negative trials the experimenter went through the motions of loading, thus counterfeiting the attendant sounds. (3) If after two minutes3 from the inception of the measurement of latent period the animal had not left the starting box, the door was closed, and the watch was reset and restarted. The latent period for such a trial was arbitrarily recorded as ‘infinite.’ In the thirty-second period immediately thereafter, the runway was manipulated, and the food box appropriately handled. At the thirty-second interval the door was opened for the next trial. Finally, (4) if, after two minutes from the inception of the measurement of running time the animal had not entered the food box, he was taken from wherever he might be, and placed in the starting box, the door to which was then closed. From this point, the same procedure was carried out as in (3) above. Under the conditions of (4) running time was recorded as ‘infinite.’

The training of the four experimental groups was as follows:

For Group I, six cycles daily for four successive days of three trials unreinforced on the black alley and black food box; one trial reinforced on the white alley and white food box.

For Group II, six cycles daily for four successive days of three trials unreinforced on the white alley and white food box; one trial reinforced on the black alley and black food box.

For Group III, six cycles daily for four successive days of three trials unreinforced, one trial reinforced, all on the black alley and black food box.

For Group IV, four cycles daily for six successive days of three trials unreinforced on the black alley and black food box; one trial reinforced on the white alley and white food box.

Groups I and II are the fundamental groups, in that they offer the possibility of comparison of acquisition of discrimination under two conditions: (1) where the initially strong chain is extinguished and the related chain reinforced (Group I); and (2) where the initially strong chain is reinforced and the related habit extinguished (Group II). Fifteen rats were run in each group.

Group III offers an important control on (1) the possibility of the development of a temporal discrimination, and (2) the effect produced when generalization of reinforcement and non-reinforcement is necessarily complete. Seven rats were utilized.

Group IV was run with a view to examining for possible effects of massed practice. Nine rats were run in this group.

Each rat’s daily run, with exceptions to be noted below, began within one-half hour of 24 hours after the beginning of the preceding day’s run. However, in the cases of four rats in each of groups I, II, and III, and one rat in Group IV, a lapse of 48 hours occurred between one pair of daily runs. These rats were fed 20 minutes on the intervening day. None of the records of these animals shows any appreciable differences from the records of the other animals in the same group which could be attributed to the longer interval.


The Animals
Seventy-one male albino rats, members of the inbred Wistar strain which constitute the colony of the Psychological Laboratory were used in the present experiment. Of these, twenty-five were discarded, ten for failure to leave the starting box on the first trial of initial acquisition, eight for failure to run on a positive trial, either in the initial training series or in a positive trial of the first few discrimination cycles, and four because of error in experimental handling.

The animals were obtained from thirteen litter groups, each constituted of double first cousins, and the members of each experimental group were drawn as evenly as possible from all litter groups. The average ages of the rats in each groups on the day on which discrimination training started were: Group I, 82 days; II, 82 days; III, 84 days; IV, 83 days. The experimental group membership of any individual rat was not decided until after initial acquisition training was complete, and then was determined by lot. This ensured that the groups were comparable.

The experiment was carried out through the fall and winter of 1940-1941.

In summary, the following important variables have been held constant for all groups: drive, amount of reinforcement, trial interval, initial response strength, and environmental stimulation. Magnitude of response has been measured as a function of the correlation of reinforcement and non-reinforcement with the distinctive stimuli.

Results

General

In accordance with the procedure of Graham and Gagné (6) all measurements have been converted into logarithms. The rationale of this conversion has been fully discussed elsewhere(6), and need not concern us here.

Since the failure of response to occur within two minutes has been arbitrarily recorded as representing a latent period or running time of ‘infinite’ duration, it was not possible to determine a mean log latent period and mean log running time for many non-reinforced trials. However, means have been computed for (1) all reinforced trials, (2) the initial trial of any day, and (3) (in Group III) especially a sample of unreinforced trials, where an ‘infinite’ measure was not obtained. The median has been determined for all trials.

Examination of the data shows that the median of the measurements closely approximates the mean of the logarithmic distributions. From consideration of this close similarity of the measures, the validity of the conversion of raw data to logarithmic values appears evident. The median, then represents satisfactorily the central tendencies of the distributions obtained. Similarly, the quartile deviation was determined instead of the average deviation. WhereQ3 was measured as infinite, it was not, of course, possible to measure the quartile deviation, so the value (Median-Q1) was employed.

In the light of these considerations, and because the median is the only measure of central tendency available on all trials, the graphs and discussions presented will be based upon the median values.

 

The Data
1. Complete Graphs.4

Fig. 2. Complete learning curves, Groups I (B-, W+) and II (B+, W-). The vertical lines seperate daily runs. “Infinite” latent periods and running times are arbitrarily plotted on the horizontal line at the top of each graph. This is identified by the infinity sign which interrupts it. The logarithms of the measures are plotted as a function of the number of reinforcements and non-reinforcements.

a. Group I, median log latent period.
b. Group I, median log running time.
c. Group II, median log latent period.
d. Group II, median log running time.

Figures 2 and 3 present learning curves for each of the four groups.5 On each graph, the vertical lines show the point at which each day’s training ceased. A horizontal line which is marked ‘(infinity symbol-SC)‘ and which originates approximately at the ordinate value of 2.08 log units (120 under other conditions for the same number of trials), (4, 5). The form of the curve, however, approximates that found by Gagné under somewhat similar conditions.

Fig. 5. The aquisition of the response to the positive stimulus which is measured by log running time. To the left of the black line, median times for the first twelve trials, reinforced and unreinforced, are plotted. To the right, only the magnitude of response on the positive trials is plotted. The curves are visually fitted to the data.

There are a number of significant findings in the data for all groups for the first cycle. They are:

1. There are no significant group differences on the first three unreinforced trials despite the fact that the white alley and food box are now presented to members of Group II.

2. On the first unreinforced trial, the median latent period for all groups is almost identical with that of the last trial some 24 hours before.

3. After one non-reinforced trial, all groups run more rapidly. Subsequently, extinction takes place.

4. On the 12th trial (the first reinforced trial of the first discrimination cycle), there are no differences between the groups which are attributable to the experimental handling. The considerably slower time shown by Group IV on this trial is certainly significant, as will appear later, but the writer can give no account of it. This group, it should be emphasized, has had to this point precisely the same handling as Group I. However, there is no evidence of discrimination. After the first trial, the form of the curve is like that described by Gagné (6) in extinction. The form of the extinction curve (trials 9-11) suggests that the 24 hours intervening between acquisition and extinction introduce complications which must be described in any theoretical account. Discussion of this effect is reserved.

b. (a? I never saw an “a” in the paper-SC)Running Time.–Through the course of eight trials, the running time drops in a regular manner from 10 seconds to 1.8 seconds, and change of 5.6 : 1. This is but slightly larger than the similar ratio for the latent period. However, the running time has by no means reached a stable level. The curve of acquisition for running time, furthermore, is different in form from that of latent period; it is not possible to fit both data with the same curve.

Striking differences seem to appear in the first cycle, despite the fact that all four groups start with approximately the same running time of some 2.8 seconds and that through trials 9, 10, and 11, Groups I, II, and III have identical handling. However, examination of the distribution determining each point for the four groups on trials 10 and 11 indicates that the apparent differences in these groups are artefactual. Accordingly, the data have been visually fitted with curves which describe them satisfactorily. The observations then may be summarized:

1. Groups I, III, and IV systematically extinguish over the first cycle. However, Groups I and IV run more rapidly in the presence of the new discriminative stimuli than does III, which continues on black. This difference might be expected from the phenomena of generalization of the effects of non-reinforcement.

2. Group II clearly behaves differently from the other groups on trials 10 and 11. On trial 10, it runs faster than the others, and, indeed, faster than on trial 9. In this respect, its behavior resembles that of all groups in latent period. Further, after the 2d non-reinforcement, its time is strikingly longer–RT (for trial 11): RT (for trial 10) = 11 : 1. And with the presentation of the discriminative stimulus, there is an equally striking speed-up of the order 3 : 1. This group, then seems to respond differently on the first presentation of the positive stimulus, after extinction of response to the negative.

In summary, Groups I and IV show a slight differentiation of response to the newly introduced stimuli; Group II displays discrimination to a marked degree.

3. Changes of Strength in Response to the Positive (Reinforced) Stimulus.

The graphs of Figs. 4 and 5 present, to the right of the vertical black line, for each group, the changes occurring in latent period and running time as a function of number of reinforcements of the positive stimulus. Thus the unreinforced trials of each cycle are omitted. The adequacy of the fits is evident.

a. Latent Period:

1. The same curve fits the data of Groups I, III, and IV. For Group IV, the curve is shifted one cycle to the left, in order to fit the initial point. Groups I, III, and IV have in common the following history: initial training on black alley, black food box followed by non-reinforcement on black alley, black food box.

2. Group II, which continues to be reinforced on black, shows consistently slower improvement, a finding suggestive of greater generalization of the effect of non-reinforcement from the weaker response to the stronger. This is consonant with Williams’ (16) findings in an experiment on the interaction of habits differing slightly in both stimulus and response.

3. Systematic deviations from the fitted curves appear in Groups I, II, and III. These appear only for the last two or three positive trials of any day. Such deviations, with one exception, do not appear in the data of Group IV. The exception is on the last positive trial of a day’s run. Group II shows this effect most persistently.

4. At the end of the present training, all groups have come to a final stable level.

b. Running Time:

1. The same curve fits the data of Groups I, II, and IV.

2. The curve of Group III, where there is no discriminatory stimulus, shows a much more rapid decrease in running time. The cumulative effect of the reinforcements, then, is greater for these animals.

3. At the end of training, Group III alone has reached a final stable level of performance.

4. Changes of Strength in Response to the Negative Stimulus.

The changes in strength of the responses to the negative stimuli are more complicated than those to the positive. Intervening between presentations of the positive stimuli are one reinforcement of the positive and three non-reinforcements of the negative stimulus. Between successive presentations of the negative stimuli there is no such consistency. Although between every pair there occurs one non-reinforcement of response to the negative stimuli, between every third pair there also occurs a reinforcement of response to the positive. Consultation of the learning curves (Figs. 2 and 3) shows that this difference in treatment has two effects. The first is to increase on the first trial of a cycle the strength of the response measured by running time throughout the training, and the strength of that measured by latent period in only the earlier part of the training. The mechanism whereby this occurs seems to be generalization of reinforcement. Performance on subsequent negative trials takes the form of an extinction curve. A reversal of effect occurs in the response measured by latent period in the latter section of training. Despite this complication, it is necessary to make arbitrary measure of the general level of response to the negative stimulus through each cycle. The median log latent period and median log running time of all runs in response to the negative stimuli in each cycle have been found satisfactory for the purpose. These values are plotted in the graphs of Fig. 6.

Examination of these data shows conspicuous day-to-day effects. One such effect has been found previously in the present experiment: In the fifth and sixth cycles of the first days of training, there appears in the latent period measure of Groups I, II, and III a decrement in response to the positive stimulus which disappears over night. This finding appears to hold for the same measure of response to the negative stimuli; the persistence of the effect in Group II also occurs here. The writer is not prepared to state whether this effect is a consequence of the experimental manipulation of reinforcement and non-reinforcement, or of failure of complete adaptation to the handling which was necessary on every trial. The former interpretation is the more probable.

A second effect, which is seen most clearly in the present data on Group III, is what has been termed ‘warming-up.’ This phenomenon is more striking in the latent period measurements. The first few trials of any day show progressive decreases in both latent period and running time which are not attributable to reinforcement. Similar observations have been made in both conditioning (12) and learning (1). It has been considered a phenomenon of circular conditioning (7).

The third day-to-day effect, which may be identified as spontaneous recovery, is a marked over-night increase in the strength of response to the negative stimulus after discrimination has begun. Graham and Gagné (6) have found that spontaneous recovery occurs in the present situation over the course of ten minutes after extinction, but the degree of such recover was limited. Hovland (10) has found that such spontaneous recovery occurs in conditioned responses which have been subjected to secondary extinction. These two day-to-day effects, ‘warming-up’ and spontaneous recovery, are possibly the determinants of the form of the curve of the latent period (Fig. 4) on the first cycle. If a certain amount of inhibition of reinforcement (9) is accumulated on the primary acquisition trials, and if spontaneous recovery takes place over night, a combination of this recovery with ‘warming-up’ might produce the curve which is obtained.

Fig. 6. Magnitude of the response to the negative stimuli (measured by the median of all unreinforced trials on each cycle), as a function of the number of non-reinforcements. The vertical lines on each graph seperate the daily runs. Where “infinite” values have occurred, they are plotted on a horizontal line originating above 2.0 log units, and marked with the sign for infinity.
Log Latent Period: Each graph has the following smoothed curves:

1. Solid line: curve fitted to present data of group represented.
2. Dashed line: aquisition curve of each group.
3. Dotted line: curve fitted to present data, Group III

Log Running Time:

1. Solid line: curve fitted to present data of group represented.
2. Dashed line: aquisition curve of each group.
3. Dotted line: curve fitted to present data, Group III.

a. Log Latent Period:

The data of Group III reveal no tendency toward the development of a temporal discrimination. In the absence of discriminative stimuli, each reinforcement produces so great an increment in strength that the response is maintained at a level stronger than before through three successive non-reinforced trials. Consequently, the two fitted curves draw regularly together, and from the end of the first day there is no recognizable effect of any single reinforcement or non-reinforcement on the succeeding trials. The change in strength over the course of the remainder of the experiment is gradual, and is not reflected especially in either positive or negative trials.

Through the first 14 cycles, the behavior of Group I is indistinguishable from that of Group III. There is no evidence of discrimination until the 15th cycle, when the process begins its progressive development, interrupted only by the 24-hour interval. Although the weakening of response to the negative stimulus follows cycle by cycle the negatively accelerated course typical of extinction, examination of the complete learning record reveals that within any cycle it is the first trial in which response to the negative shows the greatest decrement. From the first trial of the cycle on, the response becomes successively stronger so that by the third trial it is nearly as strong as on the succeeding trial. These two phenomena are met with in all three groups to which discriminative stimuli were presented. The finding in the latent period measurements that discrimination first appeared on the trial following reinforcement was unexpected. Occasionally the latent period on such a trial was observed to be artificially lengthened because the rat was still eating food which it had hoarded intraorally, but this was an infrequent occurrence. It is not responsible for the form which the development of discrimination has taken. The effect is genuine, and bears strong resemblance to the phenomenon which Pavlov (12) has termed ‘induction.’

Despite the fact that the acquisition curve of Group IV is generally higher than that of I and III, the curve of strength of the response to the negative stimulus through the ninth cycle is essentially the same as those of I and III. Discrimination then appears to a slight degree for four cycles, the negative trials of each cycle taking the form of very low extinction curves; in the fifteenth cycle, discrimination begins to appear as it did in the case of Group I, and develops in the same manner.

The data of Group II are complicated by the relatively great daily deviations. However, when these are taken into account, the results are not dissimilar to those of the other groups. Where the acquisition curve of this group was higher than that of the others, the curve of median response per cycle to the negative stimulus seems to be slightly lower, although the day-to-day deviations make any determination of its exact form extremely unreliable. Whatever the exact form of this curve may be, examination of the complete learning record reveals that the strengths of response on the positive and negative trials of each cycle are not equal until the eleventh cycle. During the next three cycles, response is approximately equally great in response to both positive and negative stimuli. On the 14th cycle, just as in Groups I and IV, discrimination clearly begins, and follows the same course of development.

b. Log Running Time:

The behavior of the running time measure of Group III parallels that of the corresponding latent period. The curves of response magnitude on both positive and negative trials draw regularly together over the course of a few cycles, even though they begin farther apart than do those describing the data on latent period. The complete learning curves also exhibit the same characteristic form: a series of extinction curves progressively diminishing in height until there are observable no differences between responses on the positive and negative trials. These findings parallel those of Skinner (14) on periodic reconditioning.

The running time data of Groups I, II, and IV are strikingly different from those of the latent period. first, discrimination appears to some extent on the first cycle, and rapidly increases in degree; there is no period of complete generalization of reinforcement before discrimination develops. Second, the responses on the negative trials of each cycle characteristically take the form of an extinction curve, and the curves of each successive cycle are displaced upwards from that of the preceding cycle. The present curves indicate the level at which these successive extinction curves occur, and consequently indicate the degree to which discrimination has developed; they reflect further the degree of spontaneous recovery occurring over the 24-hour intervals.

Groups I and IV behave alike in that both show on the unreinforced trials of the second and possibly of the third cycle a general level of performance strikingly similar to that of Group III. However, there is less discrepancy between the acquisition and extinction curves of these groups than there is between those of Group III. This is indicative of a lesser degree of generalization of reinforcement, and certainly reflects the difference of degree of decrement in response to the positive stimulus exhibited in the first cycle. A great difference in the strength of the negative and positive response develops progressively day by day after this initial strengthening of response on the negative trials. The trend is interrupted by daily spontaneous recovery. Significantly, the degree of spontaneous recovery exhibited by Group IV does not decrease progressively as does that of Group I; this would seem to be the only effect produced by the lesser degree of massing of extinction trials in Group IV.

Group II shows less generalization of the effects of reinforcement after the first reinforcement, so that on the second, third, and possibly the fourth cycle, the level of response to the negative stimulus is approximately equal to that to the positive. Subsequently, discrimination proceeds as it does in the other groups.

5. The Effect of Reinforcement of the Positive Stimulus on the Strength of the Response to the Negative.

The reciprocal effects of reinforcement and non-reinforcement of one reflex on the strength of the other are definite enough. Although the generalization of the effects of non-reinforcement is more clearly marked than the generalization of reinforcement, the latter is readily detectable in the differences between the various curves which have been presented.

To reveal any systematic changes in the increments in the strength of a response immediately after reinforcement of response to the same or to the other stimulus, the data on odd-numbered trials of discrimination learning have been plotted. Lines drawn between trials preceding and following reinforcement graphically demonstrate the effects of each reinforcement. These graphs are presented in Fig. 7. Inspection indicates that although the variability of the effect is too great to permit any more precise presentation, there is sufficient regularity to indicate certain general trends. The coordinates of the starting point of each line are indicative of the strength of the response before reinforcement; the slope of each line varies with the amount of the increment in strength resulting from the intervening reinforcement. The ‘induction’ effect is clearly marked in the log latent period plots for all groups.

Fig. 7. The effect of reinforcement on the following run. Odd numbered trials are diagrammatically plotted, and the lines have been drawn between the points representing trials immediatly preceeding and following reinforcement. Daily runs are seperated by verical lines. The crosses indicate the strength of the first response to the reinforced stimulus during the discrimination procedure.
6. Retention

It was possible to test a limited number of animals in each group for retention. After a varying interval from the last day of discrimination training, each animal was run for two cycles. The data on Groups I and IV were made available by Mr. R. N. Berry, who obtained them in the course of experimentation on certain of the animals used in the present experiment. Three animals of Group I were run, all on the twenty-first day after the end of discrimination learning. Three of the nine animals of Group II were run on the twentieth, five on the twenty-first, and one on the twenty-second day. The four animals of Group III were run as follows: one on the tenth, two on the fifteenth, and one on the twentieth day. Two of the members of Group IV were run on the twenty-first day, and one on the twenty-ninth. Since there were evident no differences which could be attributed to differences in the intervals, the records of all rats in each group have been lumped together. The median measures for each trial are presented in Tables IA and IB. Three rats, two in Group II and one Group III failed to leave the starting box, and are not included.

The discrimination shows little impairment over the intervals used. It should be observed, however, that a great degree of recovery from extinction has occurred, and that the effect of generalization of reinforcement is concurrently augmented. This great degree of retention is more familiar in conditioning experiments than in the usual ‘rewarded’ learning situation.

7. Individual Differences.

Certain individual differences appeared in the course of discrimination learning. None of these was such as to impair the description of the course of such learning which has been offered. The first of these differences which should be noted is in the rate of development of the discrimination. The latent period data of some animals clearly showed differential behavior long before it was evidenced in the group curves; in other animals, it was delayed for several cycles. Similarly, discrimination in the response measured by running time was in some rapidly developed, and in others slowly. In the latter case, a degree of generalization like that appearing in the latent period data occurred. One strikingly ‘slow’ animal showed no discrimination in the latent period until the twenty-first cycle, and in running time, until the fourteenth.

It was also observable that some curves, although they showed the changes characteristic of the group curves, were displaced upward one to two tenths of a log unit. These rats, then, ran uniformly more slowly.

Rarely, there appeared a deviation of which no account can be given. All runs of a given rat on a single cycle have in a few cases been displaced upwards as much as a log unit, but the form of the resulting curve was not different from that of the proceeding and following cycles.

8. Observational Material.

During the course of extinction of the response whose strength was measured by the running time, certain forms of behavior appeared which were remarkably consistent from rat to rat.

The first of these appeared typically during that part of learning when the latent period indicated that complete generalization of reinforcement to both positive and negative stimuli had occurred. It can best be described as ‘putting on the brakes.’ As soon as the starting door was opened, the animal ran out at a considerable speed. As it proceeded down the alley, it slowed down rapidly in a manner which indicated active suppression of the running behavior; the animal, as it were, stopped itself. After some hesitation, it would then proceed slowly to the food box, or return to the starting box.

The second has been termed ‘abortive entrance’: the animal, actively exploring the region of the food box, ran into the food box, at the same time turning in such a manner that no sooner were its rear quarters in the box than its head appeared from inside. When the movement was complete, the animal usually remained sitting for a short time and then resumed its exploratory activity. Such ‘abortive entrance’ occasionally occurred several times during one trial. The third form of behavior observed was a marked increase in exploratory behavior. The animal moved from place to place up and down the alley, vigorously vibrating its vibrissae, moving its head, and leaning far off the alley. The regions in which exploratory behavior occurred most extensively were in the immediate vicinity of the food box and in and about the starting box. Many animals would attempt to clamber up the screens, and occasionally one would climb down on the rubber roller and support which maintained the runway in position. When this occurred, the animal was immediately replaced on the alley. Interpolated between such exploratory behavior was running back and forth on the alley. Some animals ran up and down the alley as many as six times in the course of two minutes.

These phenomena are very similar to those reported by Wendt (15) in his investigation of loudness discrimination in monkeys. However, in his experiment, such activity occurred between the presentation of the signaling stimulus and that of the conditioned stimulus. Wendt considered that his animals might have been expected to show indications of Pavlovian ‘inhibition of delay’ during this interval.

Only once was there observed behavior symptomatic of inhibition as it was conceived by Pavlov. Even on those trials when the animals were developing discrimination in the reflex measured by the latent period, they were actively exploring the starting box and the door region. ‘Huddling,’ such as that reported by Gagné (6) and others did not appear.
When the door of the food box was closed, it invariably rested on the tail of the animal. On unreinforced trials the tail was withdrawn by the animal almost immediately. But occurrence of the reinforcing reflex clearly seemed to inhibit such withdrawal. During the course of discrimination training, observations were made on this. At one minute after closure of the door to the food box, on 1004 of a total of 1104 reinforced trials, the tail was not withdrawn. Corresponding data on trials where reinforcement was absent yield a frequency of 21 out of 3312 possible.
Defecation and urination never occurred on a reinforced trial. This also seems to suggest inhibition of conflicting responses by eating behavior. Their occurrence on non-reinforced trials, however, was not so frequent as to suggest an emotional effect of non-reinforcement.

Discussion
According to the present analysis, the two habits manifested in the behavior which has been studied constitute two chains composed of two reflexes each. Skinner (13, 14) has defined the reflex as ‘the correlation of a stimulus and a response at a level of restriction marked by the orderliness of changes in the correlation.’ By this definition, the analysis of the behavior into at least two, and possibly four, reflexes is correct, and only the applicability of the concept of chaining is open to question. If this concept is rejected, then it would seem to follow that in measuring both latent period and running time, two different measures of the same reflex have been made. But the striking differences in the behavior of these two measures following the same procedure of reinforcement and non-reinforcement render this highly improbable so that it may be concluded that four, and not two, reflexes have been investigated, and that these reflexes are organized into two chains. This investigation, then, has determined the course of the changes in the strength of the members of the two chains occurring as a consequence of differential reinforcement of the chains. Such changes have been revealed as orderly and gradual. They are not independent of one another; a major degree of interaction occurs.
Discrimination develops in those reflexes immediately preceding the reinforcing reflex as the simultaneous strengthening of one and extinction of the other. This statement is qualified only by the extent of the changes in the strength of one induced by reinforcement or non-reinforcement of the other. Its course conforms with that described by Hilgard, Campbell, and Sears (8), and Skinner (1f4).
The development of discrimination in reflexes more ‘distant’ from reinforcement cannot be so simply described. Initially the reinforced reflex shows induced changes in strength equal to the direct changes in strength resulting from periodic reinforcement. This is comparable to the great degree of ‘generalization of excitation’ reported by Pavlov (12) as appearing in the trace conditioned reflex. The weakening of the reinforced reflex which eventually appears follows a course suggestive of what Pavlov has termed ‘negative induction.’ ‘Induction’ is one of the few findings of Pavlov which have received only limited attention. It has played no part in the construction of conditioned reflex theories of learning. Although tentative hypotheses may be advanced with respect to this phenomenon, a need for critical research is indicated. When non-reinforcement is considered as simple failure of occurrence of the reinforcing reflex without regard to the previous occurrence or non-occurrence of the reflex being extinguished, this finding cannot be considered as a correlate of the interval between the presentation of the stimulus and the non-reinforcement of response to it. It is proposed that the consequences of variation of the interval between reinforcement of one reflex and the presentation of the stimulus to another, which has not been reinforced, be investigated.
The data which have been presented bear close resemblance in detail to comparable data obtained in the classical differential conditioning experiments of Pavlov (12). In both cases, trial-by-trial measure of response in individual animals clearly shows the phenomena which Pavlov described ; here, the results are shown to occur in group as well as in individual data.
The phenomena which have been reproduced are those from which Pavlov derived many of the concepts which he employed in the description of discrimination. These concepts include: ‘generalization of excitation,’ ‘differential inhibition,’ ‘inhibitory after-effect,’ and ‘induction.’ The foundation of his proposed measure of stability of differentiation is indicated. Other Pavlovian theoretical notions whose empirical basis has been verified are ‘extinction,’ ‘inhibition,’ and ‘excitation.’ The phenomena from which were inferred the properties of these processes have also been largely confirmed: spontaneous recovery (‘liability of inhibition’) and the effects of spacing on extinction, among others. In both sets of experiments, ‘warm-up’ and evidence of discrimination on the first closely successive presentation of positive and negative stimuli appeared. There is no evidence in the present experiment on ‘external inhibition,’ ‘disinhibition,’ and ‘irradiation and concentration of excitation and inhibition.’
In one striking respect, Pavlov’s account has been found inadequate. The behavior manifested during the extinction of the unreinforced reflexes bears little resemblance to that described by Pavlov, but it is rather suggestive of the formulation proposed by Guthrie (7) and Wendt (15). The great activity of our animals during extinction of response to the negative stimuli is in marked contrast to the general suppression of activity described by Pavlov as appearing during extinction and during the latent periods of delayed and trace reflexes.
The data presented by later experimenters (9) on the conditioned reflex which have led to the concept of ‘inhibition of reinforcement ‘ with subsequent spontaneous recovery seem to be substantiated by our data on the behavior of the latent period during the first days of experimentation.
Insofar as the present experiment has repeated the findings of Pavlov, it tends to confirm those theoretical treatments which have been based on his work. Indeed, the form of the curves presented seems to confirm in detail the hypotheses of Spence with respect to the magnitude of increments and decrements in reaction tendencies as a consequence of reinforcement and non-reinforcement. Since we have used only two pairs of stimuli, no conclusion can be drawn with respect to the ‘gradients of generalization.’ However, the technique of the present experiment might well be used in the determination of the form of such gradients.

Summary
1. The present experiment has considered the acquisition of discrimination in a simple form of behavior. This behavior consists in running down a narrow runway to a food box when the door of a starting box is opened. Analysis of this habit indicates that it may be considered a chain of two reflexes. The measures of strength of these two reflexes which have been employed are termed respectively latent period and running time. The course of changes in strength of both reflexes in each of two such chains resulting from differential reinforcement of the chains has been traced. As a control, the effect of the periodic reinforcement of one chain has been determined.
2. The experimental design permits control of preliminary training, drive, amount of reinforcement, spacing and massing of trials, and magnitude of both positive and negative stimuli. The independent variable is the systematic association of reinforcement and non-reinforcement with responses to two stimuli, black alley and food box, and white alley and food box. The dependent variable is strength of response to those stimuli.
3. The periodic reinforcement of a single chain produces initially a series of extinction curves between successive reinforcements. These become progressively lower. Ultimately the strength of response reaches a high and stable level unaffected by single reinforcements.
4. Differential reinforcement of two chains results in the gradual development of discrimination. The strength of response to the positive stimuli follows the course of a smooth acquisition curve. The strength of the second reflex in the unreinforced chain decreases following a course typical of extinction; the strength of the first initially increases with the strength of the reinforced chain before evidence of discrimination clearly appears.
5. The following phenomena, familiar from Pavlovian conditioning, have been observed: extinction, spontaneous recovery, generalization of the effects of reinforcement and non-reinforcement, and induction.
6. The behavior of the animals during extinction does not exhibit the symptoms of Pavlovian inhibition.
7. A decremental effect of massed practice is evident in the data. ‘Warming-up’ also appears.
8. The course of acquisition of discrimination in the locomotor response which has been studied conforms in detail with that described by Pavlov in the conditioned salivary reflex.

(Manuscript received June 15, 1942)
References
1. Ellson, D. G. Quantitative studies on the interaction of simple habits. I. Recovery from specific and generalized effects of extinction. Journal of Experimental Psychology, v.23, pp. 339-358, 1938.
2. Finger, F. W. The effect of varying conditions of reinforcement upon a simple running response. Journal of Experimental Psychology, v. 30, pp. 53-68, 1942.

3. Gagné, R. M. External inhibition and disinhibition in a conditioned operant response. Journal of Experimental Psychology, v. 29, pp. 104-116, 1941.
4. Gagné, R. M. The effect of spacing of trials on the acquisition and extinction of a conditioned operant response. Journal of Experimental Psychology, v. 29, pp. 201-216, 1941.
5. Gagné, R. M. The retention of a conditioned operant response. Journal of Experimental Psychology, v. 29, pp. 296-305, 1941.
6. Graham, C. H. & Gagné, R. M. The acquisition, extinction, and spontaneous recovery of a conditioned operant response. Journal of Experimental Psychology, v. 26, pp. 251-280, 1940.
7. Guthrie, E. R. The Psychology of Learning. New York: Harper and Bros., 1935, pp. viii and 258.
8. Hilgard, E. R., Campbell, A. A., & Sears, W. N. Conditioned discrimination: the effect of knowledge of stimulus relationships. American Journal of Psychology, v. 51, pp. 498-506.
9. Hovland, C. I. ‘Inhibition of reinforcement’ and phenomena of experimental extinction. Proceedings of the National Academy of Sciences, v. 22, pp. 430-433, 1936.
10. Hovland, C. I. The generalization of conditioned responses. III. Extinction, spontaneous recovery, and disinhibition of conditioned and of generalized responses. Journal of Experimental Psychology, v. 21, pp. 47-62, 1937.
11. Hull, C. L. The goal gradient hypotheses and maze learning. Psychological Review, v. 39, pp. 25-43, 1932.
12. Pavlov, I. P. Conditioned Reflexes. New York: Oxford University Press, 1927, pp. xv and 430. (TR. by G. V. Anrep.)(?-SC)
13. Skinner, B. F. The generic nature of the concepts of stimulus and response. J. gen. Psychol., v. 12, pp. 40-65, 1935.
14. Skinner, B. F. The behavior of organism. New York: D. Appleton-Century Co., 1938, pp. ix and 457.
15. Wendt, G. R. An interpretation of inhibition of conditioned reflexes as competition between reaction systems. Psychological Review, 43, pp. 258-281, 1936.
16. Williams, S. B. Transfer of extinction effects in the rat as a function of habit strength. Journal of Comparative Psychology, v. 31, pp. 263-280, 1941.

Footnotes
1 This paper was presented as a thesis to the Faculty of the Graduate School of Brown University, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Psychology, June 1941. The author is indebted to Professor C. H. Graham of Brown University, under whose direction the present research was carried out.
2 Such an analysis of the behavior under investigation may be open to question as a gross over-simplification. The writer is fully aware of this vulnerability to criticism, but accepts it nonetheless in view of its ready submissibility to experimentation and of the adequacy of its description of the behavior observed and results obtained.
3 See exception, p. 458.
4 Complete tables, including the means, average deviations, and quartile deviations calculated are obtainable in the John Hay Library of Brown University.
5 Such curves for individual animals do not differ in any significant respect from the group curves. Curves for each animal may be obtained by application to the writer.

Creative Commons License http://www.verplanck.conductual.com is licensed by Admin under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.