-
PDF
- Split View
-
Views
-
Cite
Cite
Dawei Jin, Hanbo Yan, Intervention Effects in Mandarin Chinese—An Experimental Study, Journal of Semantics, Volume 41, Issue 2, May 2024, Pages 121–148, https://doi.org/10.1093/jos/ffae006
- Share Icon Share
Abstract
This paper presents a formal judgment study of Mandarin intervention effects, that is, structures containing a wh-phrase c-commanded by a focus-sensitive or a quantificational expression. There has been significant disagreement in the literature regarding which type of wh-phrases gives rise to intervention, as well as which one among the c-commanding scopal operators is an intervener. There are competing empirical claims in the literature, which have to this day not been subject to experimental evaluation. The results of our study show that wh-nominals and wh-adverbials exhibit a similar pattern of degraded acceptability. Our results further show a clear distinction between why and other wh-phrases, favoring Ko’s (2005) idea that a separate, why-induced scope effect is disentangled from the garden-variety case of wh-intervention.
1 INTRODUCTION
The configuration in (1) has been shown to lead to unacceptability known as the intervention effect: A Q (interrogative) operator should associate with a wh-phrase within its scope, but judgment is degraded when the wh-phrase is c-commanded by a closer focus-sensitive or quantificational element. The intervention effect disappears when the wh-phrase scrambles to a structurally higher position than the focus/quantificational element.

This paper investigates the pattern of intervention in Mandarin Chinese in situ wh-questions. An illustration of the intervening configuration and the circumvention under scrambling in Mandarin is given in (2a) and (2b), respectively.1

The exact pattern of judgments in Mandarin is far from straightforward. The literature disagrees over whether degradation (if arisen) is sensitive to the choice of wh-phrase (e.g. with the which-phrase replaced by who, what, how or why). It is further debated whether quantifier phrases (e.g. most, every, no, few) trigger the same level of degradation as focus phrases.2 In other words, the acceptability data lying at the heart of the debate are very subtle, involving multiple cases of gradient contrasts (Gibson & Fedorenko, 2010; Gibson et al., 2013; Linzen & Oseki, 2018; Schuetze, 1996). The objective of this paper is thus to obtain a clearer understanding of these contrasts by conducting an experimental evaluation. We present rating results using stimuli representing the configurations in (2a) and (2b), while manipulating the subtypes of wh-items and potential interveners. Our study found that nominal and adverbial wh-phrases are similarly degraded within an intervening configuration, challenging a long line of research dating back to Aoun & Li (1993). One exception to the above finding involves the why-adverb: We find why induces stronger degradation within an intervening configuration, compared against the other wh-phrases, including wh-adverbs and wh-nominals.3 We argue that this distinct acceptability pattern points to a separate ungrammaticality phenomenon unique to the why-adverb (e.g. Ko, 2005).
The rest of this paper is structured as follows. Section 2 presents the relevant background on intervention effects in Mandarin Chinese. Section 3 reports the experiments and Section 4 concludes the paper.
2 THEORETICAL BACKGROUND
2.1 Wh-type
2.1.1 Wh-nominals and wh-adverbials
We start with the question of whether there is variation among wh-phrases when they are c-commanded by a focus-sensitive or a quantificational expression. We specifically test the prevailing approach to Mandarin wh-intervention according to which nominal wh-phrases differ from wh-adverbials by inducing no intervention (e.g. Aoun & Li, 1993; Kim, 2002; Law, 2001; Soh, 2005; Tsai, 2008; Yang, 2011). A nominal expression as defined in Tsai (1994) is one whose structure contains a (theta role-bearing) nominal element. Expressions like what or in what way are treated as nominal. A different, referentiality-based criterion is adopted in Reinhart (1998), wherein what and in what way are also classed together, as both range over individuals. The question we are addressing is thus whether configurations like (3) are exempt from intervention (% indicates disagreement).

The approach we are interested in is built upon the view that intervention effects are a locality constraint against movement. While exact implementations vary, the core claim is that the interrogative Q operator cannot move across a quantifier/focus phrase if the latter is closer to the clausal scope position than Q. For example, Yang (2011) draws on Rizzi (1990) and Starke (2001) in arguing that a quantifier/focus phrase introduces an operator that matches the feature/set of features of Q, therefore introducing a minimality effect. Crucially, the notion of nominality matters, because it is assumed that movement relates to the way Q binds the wh-restrictor within its domain. Two mechanisms of binding are proposed to be available to Mandarin (Aoun & Li, 1993; Cheng, 1991; Cole & Hermon, 1994; Stepanov & Tsai, 2008; Tsai, 1994). In one mechanism, Q may form a unit with its restrictor at the in situ position, before moving to the clause edge position to be interpreted for scope (assuming Pesetsky 2000’s feature movement). According to an alternative mechanism, Q directly occupies the scope position and binds its restrictor from at a distance (Pesetsky 1987’s unselective binding), similar to the behaviors of other sentential operators as in NPIs and adverbs of quantification. It is further proposed that only nominal phrases introduce the right type of variable to be bound unselectively. Wh-adverbials (wh-phrases that do not contain a transparent, divisible nominal component) always trigger feature movement.
With this assumption, the approach predicts that intervention effects in Mandarin are found with wh-adverbials, not wh-nominals (Cheng & Rooryck, 2000; Law, 2001; Soh, 2005; Tsai, 1994; Yang, 2011). Compare the LF structure of a wh-nominal (as in 4a) with a wh-adverbial (as in 4b). (4a) is not problematic: Q is merged directly at the clause edge, violating no locality constraint. In contrast, the structure in (4b) is ruled out, as Q moves across an intervening focus operator.

The approach is further compatible with circumvention effect during scrambling, i.e. no degradation occurs when the wh-restriction moves across the intervening operator. The gist is that scrambling involves phrasal movement in the sense of Pesetsky (2000). A distinction is drawn between feature movement (i.e. Q moves) and phrasal movement, the latter involving the entire wh-chunk (Q and its restriction together) being pied-piped to the clause edge.
The above discussion pivots around the claim that Q binds into its wh-restrictor differently, which is independently supported by diagnostics of feature movement in Mandarin. Importantly, the strong island effects in wh-in-situ are known to observe a pattern of nominal-adverbial asymmetry (Aoun & Li, 1993; Hagstrom, 1998; Huang, 1982; Nishigauchi, 1990; Tsai, 1994; Watanabe, 1992). As (5) illustrates, Mandarin permits wh-nominals in island contexts, whereas adverbials like how or why are not possible.

The insensitivity of nominals to the island constraint is captured if one assumes that Q does not move. The literature thus generally assumes with an interpretation of wh-nominals via unselective binding, as it explains the circumvention of island violations: Wh-scope taking does not require moving across the island domain. Now if there really is another nominal-adverbial asymmetry in intervention effects, then by assuming two distinct ways of operator binding, we are able to capture both asymmetries with a unified explanation.
In contrast to the above approach, many accounts of wh-intervention do not operate under the assumption that intervention effects are identified with the violation of a condition on movement. As an example, consider the Separation Principle as formulated in Pesetsky (2000, 67). According to this principle, a semantic restriction on a quantifier (including wh) may not be separated from that quantifier by a scope-bearing element. This way, given some mechanism that implements said principle, intervention could follow in a way that is independent of movement. As a result, it is predicted that wh-nominals and wh-adverbials alike induce intervention.4
To see how such mechanisms work, we illustrate with the focus evaluation account in Beck (2006), which (as Beck (2006, 22) states explicitly) can be viewed as an interpretational strategy underlying Pesetsky’s abstractly formulated constraint. The crucial assumption is that a focus-sensitive operator necessarily associates with all the alternatives within its scope (Rooth, 1992), not just the one it co-indexes with. This way, the alternatives introduced by these foci can no longer be passed on for the calculation of alternative sets by another focus-sensitive operator above.
We once again illustrate using only. Consider the configuration in (6), involving the ‘intervening’ of only between Q and its wh-alternatives.5

Beck assumes C denotes a contextually relevant subset of the focus semantic value of the clausal node. When fixed, the value of C restricts only. The Roothian |$\sim $| operator determines C’s value, based on the focus semantic contribution within its domain. Afterwards the mother node of |$\sim $| will have a focus semantic value that is identical to its ordinary semantic value, so that the focus semantic values of both variables corresponding to the focus DP and wh are no longer relevant for compositional interpretation at the level beyond |$\sim $|’s mother node. This gives rise to an interpretation failure: For the interpretation to proceed at the sentence level, Q needs to use the focus semantic contribution of wh. However, because the intervening |$\sim $|’s mother node is reset to the ordinary semantic value, the alternatives introduced by wh cannot be used by another operator beyond only. Hence the focus contribution of wh will never be evaluated by Q. Importantly, interpretation is ruled out under any structure where there is an offending operator intervening between Q and its restrictor at LF, so that both the base-generation of Q and feature movement give rise to intervention.6
The mechanism is capable of accounting for scrambling. In the configuration (7), the wh-phrase scrambles across the |$\sim $| operator to the Q scope, leaving behind a trace. Crucially, it is assumed that the trace left by scrambling has no focus semantic contribution.7 Consequently, the structure in (7) is interpreted without a problem: Under the assumption in Beck (2006), |$\sim $| simply evaluates the alternatives from only’s focus associate DP, and Q can evaluate the alternatives from wh.

Note that parallel with overt scrambling (i.e. overt phrasal movement), scrambling of a wh-phrase covertly is also insensitive to intervention. Here we assume with Pesetsky (2000) that an entire wh-phrase may move at LF. This contrasts with LF feature movement where the wh-restriction stays in situ. Importantly, the LF structure for a sentence with a covertly moved wh-phrase across an intervener looks the same as one with an overtly scrambled wh-phrase as in (7), repeated below.

We have already seen that the structure of (7) is treated as unproblematic. In this sense, the mechanism of covert phrasal movement can be potentially resorted to, if a wh-phrase is acceptable under an intervening configuration.
To conclude, the literature features differing predictions with regard to the nominal-adverbial asymmetry. One camp of analyses derive intervention effects from the violation of a condition on movement. Crucially assuming that the question operator binds into its restriction without movement for wh-nominals, these analyses predict a nominal-adverbial asymmetry. Contrarily, assuming that intervention is explained independent of the movement of Q, all structures with an offending operator intervening between Q and its restrictor at LF will be problematic, so that both the base-generation of Q and feature movement give rise to intervention. Henceforth, these accounts predict no distinction between the intervention behavior of wh-nominals and wh-adverbials. The two predictions will be tested in our experiments.
2.1.2 Some complications
We now discuss additional, subtler distinctions among wh-phrases that are relevant to the theoretical approaches to intervention. First, it should be stressed that within adverbials, why is noticed to take scope differently. Most analyses in the literature agree with the judgment that the why-adverbial in Mandarin gives rise to robust degradation. Crucially, however, there is an alternative theory deriving such degradation from the unique property of why (Jin, 2019; Ko, 2005; Li, 2011). According to this theory, why does not move at LF or receive binding from at a distance. Rather, it is attached high, its base position being where it is interpreted, i.e. the scope position of the Q operator. As focus-sensitive operators like only stay within the scope of Q, they also stay below why’s position, as in (9). The structure where they c-command why is thus uninterpretable.

Given that there is no operator/scopal element intervening between Q and its wh, the degradation created by a scopal operator c-commanding why is an altogether distinct phenomenon (cf. Soh 2005 for arguments against Ko’s scope-based explanation). This phenomenon is crucially based on the special property of why, and cannot be extended to other wh-adverbials. To determine whether nominals and adverbials differ in intervention, non-why adverbials have to be considered. For instance, some authors explicitly state that other wh-adverbials similarly induce intervention, e.g. zenme ‘how’ in (10). This claim needs to be tested.

Second, within wh-nominals, the which-phrase has been claimed to exhibit a different pattern. This dates back to the observation in Pesetsky (2000) that (in situ) which-phrases in English induce intervention, differing from what. Pesetsky also points to the insensitivity of which to the superiority constraint: In superiority-creating configurations, which-in-situ does not give rise to degradation, indicating that it is exempt from the superiority effect, unlike what. Pesetsky argues that these behaviors are accounted for, if the which-phrase receives interpretation via feature movement. Several authors have also pointed to a potential difference between Mandarin which-phrases and what (e.g. Beck, 2006; Kim, 2002). In an informal study, the consultants in Beck (2006, 27-28) reported a robust intervention effect in (11a), contrary to the judgments of shenme in (11b), for which there was no agreement (notation is from Beck (2006). % indicates disagreement).

We thus see that besides the lack of clarity on whether nominals lead to intervention, there is also no consensus on whether nominals are homogeneous. The point is interesting, because a which-what distinction (if established) could be evidence for the mechanism of covert phrasal movement. Recall in section 2.1.1 we showed that the configuration derived from covert phrasal movement does not involve an intervening operator. Thus, Beck (2006) suggests a difference between the which-phrase and what could be explained by assuming what (and the like, namely non-D-linked wh-arguments) undergoes phrasal movement at LF (rather than being bound unselectively).8Which would on the other hand be unable to undergo phrasal movement, and hence the which-in-situ structure in (11a) is unacceptable. Due to its above implications for the theories of intervention (the role of D-linking and its factoring into mechanisms of movement), the pattern of judgment in which-contexts needs to be clarified. We return to this issue in section 3.4, after we have experimentally established the pattern of Mandarin intervention.
A final factor in delineating wh-types has to do with the view that wh-phrases are assigned a nominal or an adverbial status depending on the particular readings they give rise to. This issue is especially relevant for how, which is ambiguous between denoting a set of manners (slowly, carefully, etc.) and denoting a set of instruments/means (using pen/using pencil, on foot/by train, etc.). The two readings cut across the line of nominality under a category-based criterion (Stepanov & Tsai, 2008; Tsai, 2009, 1994, 2008), evidenced by the crosslinguistic pattern where the how-phrase denoting instruments tends to have an internal structure, containing an isolable, theta role-bearing nominal element. In contrast, the structure of the how-phrase denoting manners tends to resist further separation. The upshot is that only manner-denoting how is a wh-adverbial, in which case the Q operator has to move. Assuming a nominal-adverbial divide, this has the consequence that the intervention effects in how-questions are expected for the manner reading, but not for the instrumental reading. Thus it is necessary to consider the specific interpretation associated with a how-question, when evaluating the effect of nominality on intervention.
2.2 Focus and quantifier
We now turn to the question of whether wh-intervention varies depending on the type of interveners, i.e. scope-bearing items that c-command the wh-restriction and that are c-commanded by the Q operator. Cross-linguistically intervention is most robustly found with focus phrases. Quantifiers in general induce intervention, yet it is known to vary between languages (cf. Beck 2006; Hagstrom 2006; Kim,2002, 2006). The questions we are addressing are thus: Do quantifiers in Mandarin induce intervention similar to focus phrases? How do the behaviors with wh-nominals compare to those with wh-adverbials? Specifically, we are interested in the pattern of acceptability associated with the two configurations instantiated by (12a) and (12b).

If the pattern of no is different from only, it complicates the predictions from existing theoretical approaches, as theories arguing for a nominal-adverbial divide assume that nominal questions circumvent intervention for all intervener types, whereas theories against a nominal-adverbial asymmetry would predict a uniform degradation. In the absence of a converging pattern between focus and quantifiers, the possibility arises for postulating additional mechanisms (e.g. two distinct intervention effects, cf. Soh 2005; Yang,2009, 2011).
Aside from the focus versus quantifier divide, another complication pertains to a fine-grained distinction within quantifier types (Jin, 2019; Ko, 2005). In the literature, the configuration with a most-NP is claimed to be acceptable (Beck, 1996; Grohmann, 2006; Jin, 2019; Ko, 2005; Li & Law, 2016; Mayr, 2013; Tomioka, 2007). In contrast, quantifiers like no, few, less than n pattern with focus phrases in giving rise to intervention. A contrast in Mandarin is given as in (13a) and (13b). We aim to replicate the reported contrast, and given that the contrast is based on why-questions, it remains to be tested whether other wh-questions follow the same pattern.

Establishing whether quantifiers are a homogeneous class of interveners is also theoretically important. In some theories, such as the dynamic semantic approaches of Honcoop (1998) and Haida (2007), all quantifiers are treated as interveners. Other analyses predict a variation within quantifiers. A plethora of proposals predict that most is not an intervener, based on theory-specific classifications. Thus, for Tomioka (2007) and Ko (2005), the intervener status of a quantifier comes down to its ability to be interpreted as a topic. Most-NPs are topical, evidenced by their ability to trigger wa-suffixation in Japanese and long-distance topicalization in Mandarin. Against the same diagnostics, no-NPs are antitopical. Beck (1996, 32-34) suggests that most, but not no, could receive a group individual reading, in which case it should be treated as a plural indefinite and hence is not an intervener (cf. also Jin, 2019; Pafel, 1991). The additivity approach in Mayr (2013) predicts quantificational interveners along monotonicity lines (see also Grohmann 2006 for an earlier observation). Monotone increasing quantifiers such as most are existential quantifiers by nature and hence additive. In contrast, decreasing quantifiers such as no can be shown to be non-additive just like focus phrases. Another approach in Li & Law (2016) predicts that the underlying demarcation lies not in the monotone property, but rather in the divide between strong versus weak quantifiers. As no-NPs are ruled out for being a weak quantifier, a distinction between most and no is also predicted. We thus opt to test the intervention pattern of most against no in order to understand better the quantifier-internal distinction, as the choice is practical in the current experiment setting: The two quantifiers are reported to be different in the previous literature, and the pattern obtained allows us to decide between some of the theories, though not all.
To sum up, the literature so far has provided a large body of often divergent data, which obviously are very subtle. At the same time, a clearer understanding of the empirical patterns involving both wh-types and intervener types is important for us to see how to best explain the intervention effects. We now turn to the experiments that will help us decide upon the acceptability of structures instantiating Mandarin intervention.
3 EXPERIMENTS
3.1 Experiment 1
We conducted three experiments, all taking the form of an acceptability judgment task. Section 3.1 deals with Experiment 1, with the aim of testing whether acceptability depends on the nominal versus adverbial divide within wh-expressions, as well as the focus versus quantifier divide within potential interveners.
3.1.1 Methods
Participants A total of 51 participants (39 females and 12 males) were recruited for Experiment 1. All participants were college-educated (non-linguistics majors) Mandarin speakers who had resided in their place of birth prior to college enrollment. None of them spoke another (non-Mandarin) Sinitic language. We controlled this factor to guarantee there was no interference from a closely related language. The average age of participants was 24 years (20 to 38 years, SE = 2.61). All participants registered for a 2-credit course and received extra credit as (non-monetary) compensation for taking part in the task. Three participants (two females, one male) were excluded from analysis due to low filler accuracy (69%, 81%, 65%). 48 participants remained in the analysis.
Stimuli The experimental stimuli followed a 3|$\times $|3 factorial design. Three types of wh-expressions were tested: the why-adverb, (non-why) wh-adverbs and what. These were crossed with three interveners: an only-NP, a no-NP and a most-NP. The resulting nine combinations are illustrated in (14) (Chinese characters were displayed during the experiment).

Target sentences followed a Latin square design.9 81 target sentences (9 sets of lexical items * 3 wh-expressions * 3 interveners) were divided into 3 lists, with participants rating the target stimuli assigned to 3 groups accordingly. Each list contained 27 target sentences. Table 1 shows how each condition was assigned to each subject group.
Assignment of sentences across the nine conditions to subject groups in Experiment 1
subject . | wh-expression . | intervener . |
---|---|---|
whadv | no NP | |
Group 1 | what | most NP |
why | only NP | |
whadv | most NP | |
Group 2 | what | only NP |
why | no NP | |
whadv | only NP | |
Group 3 | what | no NP |
why | most NP |
subject . | wh-expression . | intervener . |
---|---|---|
whadv | no NP | |
Group 1 | what | most NP |
why | only NP | |
whadv | most NP | |
Group 2 | what | only NP |
why | no NP | |
whadv | only NP | |
Group 3 | what | no NP |
why | most NP |
Assignment of sentences across the nine conditions to subject groups in Experiment 1
subject . | wh-expression . | intervener . |
---|---|---|
whadv | no NP | |
Group 1 | what | most NP |
why | only NP | |
whadv | most NP | |
Group 2 | what | only NP |
why | no NP | |
whadv | only NP | |
Group 3 | what | no NP |
why | most NP |
subject . | wh-expression . | intervener . |
---|---|---|
whadv | no NP | |
Group 1 | what | most NP |
why | only NP | |
whadv | most NP | |
Group 2 | what | only NP |
why | no NP | |
whadv | only NP | |
Group 3 | what | no NP |
why | most NP |
To obtain the baseline acceptability of each participant for the target sentences, we also included another 81 control sentences, i.e. sentences where wh-items scramble over focus/quantifier items. An example of the control sentence corresponding to (14a) is (15).

24 participants did the target stimuli, and another 24 participants were presented with control items. Participants rating the corresponding controls were assigned in the same way.
We separated the presentation of the target stimuli from the controls. This is because showing both sentences in the same session could lead to a priming effect, introducing a judgment bias among participants depending on which item occurred first and which second. By separating out the targets and the controls, we were able to avoid such possible bias.
In our design, half of the items in the wh-adverbial condition took the form of a how-question (as is illustrated by (14d)), and the other half took the form of a where-question exemplified in (16).

The literature treats the Mandarin locative adjunct zai nali as involving a divisible internal structure (a prepositional head taking a nominal argument, cf. Huang 1982, Tsai 1994). In this sense, despite being an adjunct in the theta-theory sense, zai nali befits the nominal criterion, on a par with the argumental what-phrase. Testing where-questions alongside the non-divisible zenme ‘how’-questions thus allows us to determine whether any acceptability difference obtained between what-questions and wh-adverbial questions should be underlied by a distinction of nominality, or rather should be explained by a contrast between syntactic arguments and adjuncts (see Experiment 3 below for more distinctions within Mandarin how-questions).
Furthermore, we controlled our lexical choice to ensure only a strictly causal interpretation was available for the why-question items: Mandarin causal adverb weishenme can have a purpose-denoting ‘for what’ interpretation (e.g. Tsai, 1994). The causal reading is exemplified by (17). (18) illustrates a purposive reading. See Starke (2001) and Chapman & Kucerova (2016) for a similar ambiguity found in English why.


The exceptional wide scope property of the why-adverb, nevertheless, is motivated exclusively on the causal reading (Jin, 2019; Stepanov & Tsai, 2008; Tsai, 1994). To understand whether the acceptability in why-questions exhibits its own pattern, it is necessary to avoid ambiguous readings (e.g. the stative predicate in 14a rules out a purposive reading).
Moreover, in order to avoid using the perfective verbal suffix -le, generic/habitual sentences were employed to abstract from tense and aspectual specifications. We did this to rule out the possibility that subjects reject a negative quantifier sentence because they do not like perfective aspectual marking in negative environments. Such incompatibility has been independently pointed out in the Mandarin literature (Jin & Chen, 2016; Li & Thompson, 1981; Soh, 1998). For instance, the example (19), where meiyou+NP ‘no NP’ combines with verbal -le, is found to be odd by some speakers.10

Procedure The (subjective) judgment task was implemented at Wenjuanxing (https://www.wjx.cn/). The experiment took place in a quiet setting. Participants sat before a computer screen and filled in a language background survey. They then read through instructions and started the task. Six practice items were presented first, followed by the main trials. The stimulus items were pseudo-randomly interspersed with twice as many fillers serving as distractors, using the randomization software Mix. That is, each participant rated 27 critical items and 54 filler items. This yielded a total of 81 items per participant. Half of the fillers were grammatical, and the other half ungrammatical to rule out inattentive subjects. The grammaticality of all the fillers was checked by three native speakers from Beijing, China. Participants rated the sentence based on a 5-point Likert scale, with 1 being ‘very unnatural’ and 5 ‘very natural’. Each sentence item was shown on one page: The sentence was shown on the upper half of the screen page, while the scores (1-5) were horizontally aligned at the lower half of the page. A round checkbox was present underneath each score. After choosing a score by mouse-clicking on the box of choice, the participant clicked on the CONTINUE button to proceed onto the next sentence. The entire experiment, including the background survey and the task, lasted around 15 minutes.
3.1.2 Results
The judgment ratings and accuracy rates of fillers were examined. We treated a response where an ungrammatical filler received a score of 3 or above as erroneous. Grammatical filler responses were counted as accurate when receiving a score above 3. The overall accuracy rate for the filler sentences was 96%. The accuracy rate for the grammatical fillers was 98%, and that for ungrammatical ones was 94%. One lexical set of items were excluded from our analysis, because some participants provided low ratings for control sentences within the set, which they reported was due to an unnatural prosody.11 For this reason, we are reporting the dataset without this lexical set, thus only eight comparisons are relevant for interpreting the results. Note however that adding this set back into our data did not alter the patterns in our statistical results.
We now report the results of Experiment 1, with which we seek to determine whether the focus/quantifier phrases we tested are interveners, and to further determine whether a seperation can be found between nominal and adverbial wh-expressions.
To determine whether there is an intervention effect, it was necessary to compare the response of each target with its control. That is, we need to assess whether a target with an intervening configuration is more degraded than the corresponding scrambled configuration (we thank the editor for pointing out the need to consistently assess target-control comparisons, instead of assessing target sentences alone). Figure 1 shows the means and confidence intervals for the target and control items arranged by condition.

Mean ratings of the target and control items across the nine conditions in Experiment 1: A rating of 1 corresponds to the judgment ‘very unnatural’ and a rating of 5 corresponds to the judgment ‘very natural’. Error bars show 95% confidence intervals.
The data for the targets and the controls were then subject to a series of statistical tests (See our complete results here). We ran two types of statistical analysis: One in a frequentist framework and the other in a Bayesian framework. For the frequentist analysis, we constructed a series of cumulative link mixed models (CLMM) with participant and item as random effects, wh-expression (what, whadv, and why), intervener (most-NP, no-NP, only-NP), group (target, control) as fixed effects, as well as by-participant random slopes for the effect of wh-expression and intervener.12 Analyses were performed using the ordinal package in R (Christensen, 2023). Post hoc tests were conducted using the emmeans package with Tukey test (Lenth, 2022). We will interpret p-values below the conventional threshold of .05 as evidence against the null hypothesis, and p-values above the conventional threshold of .05 as a failure to reject the null hypothesis.
Because the failure to reject the null hypothesis cannot be interpreted as evidence in support of the null hypothesis, we include a Bayesian analysis to directly evaluate the null hypothesis. For this Bayesian analysis, we fitted a series of Bayesian models to ratings as a function of wh-expression (reference level “wh-adverbial”), intervener (reference level “most-NP”), group (reference level “control”) and their interactions, using Stan (Stan Development Team, 2021) via the R package brms (Bürkner, 2017, 2018, 2021). Leave-one-out cross-validation (LOOCV; Vehtari et al. 2017) was used to examine the relative fit when comparing competing models. The LOO Information Criterion (LOOIC) quantifies the estimated predictive error of a certain model, and a smaller value of LOOIC indicates a model with better prediction (Vasishth et al., 2018; Vehtari et al., 2012, 2017). We did not opt to compute Bayes factors, because we want to avoid the influence that different priors may have on the effect size results of model comparisons. In contrast to Bayes factors, using approximations of cross-validation allows us to carry out inference via evaluating the predictive performance of competing models, without the need to determine the prior for the parameter we are testing (see Vasishth et al. 2018, 24-26 for a detailed discussion). For the differences between conditions, we calculated the 95% credible intervals (CIs) and the posterior probability (P) that a difference |$\delta $| is larger than zero. Following Franke & Roettger (2019), we will interpret a P(|$\delta $| > 0) of greater than 0.95 as compelling/strong evidence that the two conditions are different. Similarly, we will interpret a P(|$\delta $| > 0) of less than 0.95 as insufficient evidence that the two conditions are different. We will also interpret P(|$\delta $| > 0) at 0.95 as inconclusive (as the data is equally likely under both theories). In the following, we will report the core statistical results based on the two types of analysis.
Figure 2 lists the nine pairwise target-control comparisons, represented as regression coefficients and 95% confidence intervals. The figure offers a visual means to assess individual target-control differences: A coefficient whose confidence intervals do not cross the zero line is statistically significant. A significant coefficient with a positive sign means that the control is rated statistically better than the corresponding target. The results here are based on frequentist models. Our Bayesian analysis yielded converging results.

Estimated coefficients and 95% confidence intervals for the nine target-control comparisons in Experiment 1.
As Figure 2 shows, in Experiment 1, we see substantial evidence of the no-NP and the only-phrase as an intervener. Target sentences with a no-NP and an only-phrase all received a significantly lower rating than their corresponding control sentences (i.e. no-what, no-whadv, no-why, only-what, only-whadv, only-why). In contrast, there was a lack of evidence that target sentences with a most-NP induced an intervention effect. In two of three target-control comparisons, no statistical difference was observed (most-what, most-whadv). Furthermore, while we found that why-question targets were rated lower than their controls (most-why), we still note that target sentences received a mean rating of 3.97 out of 5. That is, they tended towards being acceptable. Post hoc comparisons indicated that there was no difference between most-target items across all three wh-questions, with all of them falling within the natural/acceptable range. This stands in contrast with no-target and only-target items, the mean ratings of which tended towards being unacceptable (2.22/5 and 2.07/5, respectively). Taken together, we consider the data from Experiment 1 to indicate that the most-NP is not an intervener.
Figure 2 pointed to most patterning differently from no/only. We now directly explore the effect of intervener type, by investigating whether target-control differences (i.e. the differences between the control and the condition with an intervening configuration) vary across interveners. Here target-control differences were calculated by subtracting the rating of a control item from the rating of the corresponding target item. A negative value thus indicates that the target sentence is less preferred compared with its control counterpart. Given the pattern from Figure 2, we would expect a difference between most and no, and between most and only, but not between no and only. Table 2 provides the results obtained from pairwise comparisons among intervener type. We listed p-values as well as posterior probability values (P). These were presented in parallel, so that the effects from frequentist and Bayesian models can be evaluated simultaneously.
Statistical results based on analysis under frequentist and Bayesian models in Experiment 1: Comparison between intervener type.
. | . | most-NP vs. no-NP . | most-NP vs. only-NP . | no-NP vs. only-NP . |
---|---|---|---|---|
whadv | CLMM | |$p<$| 0.05 * | |$p<$| 0.05 * | |$p$|= 0.95 |
Bayesian | |$P= 1$| | |$P= 1$| | |$P= 0.62$| | |
CLMM | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |
what | Bayesian | |$P= 1$| | |$P= 1$| | |$P= 1$| |
CLMM | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |$p=$| 0.76 | |
why | Bayesian | |$P= 1$| | |$P= 1$| | |$P= 0.75$| |
. | . | most-NP vs. no-NP . | most-NP vs. only-NP . | no-NP vs. only-NP . |
---|---|---|---|---|
whadv | CLMM | |$p<$| 0.05 * | |$p<$| 0.05 * | |$p$|= 0.95 |
Bayesian | |$P= 1$| | |$P= 1$| | |$P= 0.62$| | |
CLMM | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |
what | Bayesian | |$P= 1$| | |$P= 1$| | |$P= 1$| |
CLMM | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |$p=$| 0.76 | |
why | Bayesian | |$P= 1$| | |$P= 1$| | |$P= 0.75$| |
Statistical results based on analysis under frequentist and Bayesian models in Experiment 1: Comparison between intervener type.
. | . | most-NP vs. no-NP . | most-NP vs. only-NP . | no-NP vs. only-NP . |
---|---|---|---|---|
whadv | CLMM | |$p<$| 0.05 * | |$p<$| 0.05 * | |$p$|= 0.95 |
Bayesian | |$P= 1$| | |$P= 1$| | |$P= 0.62$| | |
CLMM | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |
what | Bayesian | |$P= 1$| | |$P= 1$| | |$P= 1$| |
CLMM | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |$p=$| 0.76 | |
why | Bayesian | |$P= 1$| | |$P= 1$| | |$P= 0.75$| |
. | . | most-NP vs. no-NP . | most-NP vs. only-NP . | no-NP vs. only-NP . |
---|---|---|---|---|
whadv | CLMM | |$p<$| 0.05 * | |$p<$| 0.05 * | |$p$|= 0.95 |
Bayesian | |$P= 1$| | |$P= 1$| | |$P= 0.62$| | |
CLMM | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |
what | Bayesian | |$P= 1$| | |$P= 1$| | |$P= 1$| |
CLMM | |$p<$| 0.001 *** | |$p<$| 0.001 *** | |$p=$| 0.76 | |
why | Bayesian | |$P= 1$| | |$P= 1$| | |$P= 0.75$| |
As Table 2 indicates, both statistical analyses converge to show that most is qualitatively distinct from no and only. Across wh-questions, we see consistent evidence that target-control differences vary between most and no, as well as between most and only.13 For these two comparisons, the null hypothesis tests consistently yielded a p-value that is substantially below the conventional threshold, suggesting the null hypothesis be rejected. The Bayesian tests consistently yielded a posterior probability that is 1, suggesting that there is sufficient evidence for a difference between conditions.
With the interveners now established, we next investigate whether target-control differences vary across wh-expressions. The result directly contributes to our goal of understanding whether intervention effects are conditioned by a nominal-adverbial division. Table 3 presents results from pairwise comparisons among the three categories of what, whadv and why. Again, p-values are presented in parallel with posterior probabilities.
Statistical results based on analysis under frequentist and Bayesian models in Experiment 1: Comparison between wh-category.
. | . | whadv vs. what . | whadv vs. why . | what vs. why . |
---|---|---|---|---|
most-NP | CLMM | |$p=$| 0.71 | |$p<$| 0.05 * | |$p<$| 0.05 * |
Bayesian | |$P= 0.77$| | |$P= 1$| | |$P= 1$| | |
no-NP | CLMM | |$p=$| 0.47 | |$p<$| 0.05 * | |$p<$| 0.001 *** |
Bayesian | |$P= 0.86$| | |$P= 1$| | |$P= 1$| | |
only-NP | CLMM | |$p=$| 0.88 | |$p<$| 0.01 ** | |$p<$| 0.01 ** |
Bayesian | |$P= 0.67$| | |$P= 1$| | |$P= 0.99$| |
. | . | whadv vs. what . | whadv vs. why . | what vs. why . |
---|---|---|---|---|
most-NP | CLMM | |$p=$| 0.71 | |$p<$| 0.05 * | |$p<$| 0.05 * |
Bayesian | |$P= 0.77$| | |$P= 1$| | |$P= 1$| | |
no-NP | CLMM | |$p=$| 0.47 | |$p<$| 0.05 * | |$p<$| 0.001 *** |
Bayesian | |$P= 0.86$| | |$P= 1$| | |$P= 1$| | |
only-NP | CLMM | |$p=$| 0.88 | |$p<$| 0.01 ** | |$p<$| 0.01 ** |
Bayesian | |$P= 0.67$| | |$P= 1$| | |$P= 0.99$| |
Statistical results based on analysis under frequentist and Bayesian models in Experiment 1: Comparison between wh-category.
. | . | whadv vs. what . | whadv vs. why . | what vs. why . |
---|---|---|---|---|
most-NP | CLMM | |$p=$| 0.71 | |$p<$| 0.05 * | |$p<$| 0.05 * |
Bayesian | |$P= 0.77$| | |$P= 1$| | |$P= 1$| | |
no-NP | CLMM | |$p=$| 0.47 | |$p<$| 0.05 * | |$p<$| 0.001 *** |
Bayesian | |$P= 0.86$| | |$P= 1$| | |$P= 1$| | |
only-NP | CLMM | |$p=$| 0.88 | |$p<$| 0.01 ** | |$p<$| 0.01 ** |
Bayesian | |$P= 0.67$| | |$P= 1$| | |$P= 0.99$| |
. | . | whadv vs. what . | whadv vs. why . | what vs. why . |
---|---|---|---|---|
most-NP | CLMM | |$p=$| 0.71 | |$p<$| 0.05 * | |$p<$| 0.05 * |
Bayesian | |$P= 0.77$| | |$P= 1$| | |$P= 1$| | |
no-NP | CLMM | |$p=$| 0.47 | |$p<$| 0.05 * | |$p<$| 0.001 *** |
Bayesian | |$P= 0.86$| | |$P= 1$| | |$P= 1$| | |
only-NP | CLMM | |$p=$| 0.88 | |$p<$| 0.01 ** | |$p<$| 0.01 ** |
Bayesian | |$P= 0.67$| | |$P= 1$| | |$P= 0.99$| |
Our findings are two-fold. First, we see no evidence (in both statistical analyses) that the pattern of intervention is sensitive to the distinction between what and (non-why) wh-adverbials. Second, there is substantial evidence that why induces a more severe intervention effect than both what and (non-why) wh-adverbials.
In sum, among the core findings above are that no and only are interveners, whereas most is not an intervener. The evidence does not support the claim that intervention is sensitive to a nominal-adverbial distinction. We also found that why differs from the other wh-types, giving rise to more severe degradation in negative quantifier and focus environments.
3.2 Experiment 2
We now proceed to the second experiment, with the aim of testing whether acceptability is sensitive to the distinction between what versus the which-phrase. As section 2.1.2 has mentioned, the literature claimed that wh-nominals do not behave in a homogeneous way in intervening configurations. In Mandarin, some authors reported that which-nominals receive a more degraded judgment than what (e.g. Beck, 2006; Kim, 2002). Experiment 2 evaluated this claim formally in order to establish a full picture of the intervention pattern of wh-nominals. As in Experiment 1, we investigated the acceptability of wh-phrases within an intervening configuration that took into account three types of scopal elements (focus phrase, most-NP, no-NP).
3.2.1 Methods
Participants A total of 53 participants (36 females and 17 males) were recruited for Experiment 2. They differed from Experiment 1’s participants. All participants were college-educated (non-linguistics majors) Northern Mandarin speakers who lived in their place of birth prior to college enrollment. None spoke a second Sinitic language. The average age of participants was 24 (from 18 to 30 years, SE = 2.36). The participants received a renumeration of CNY |$\yen $|30 for finishing the task. Two of them (2 females) were excluded from analysis due to low filler accuracy rates (61%, and 69%). Stimuli The experimental stimuli followed a 2|$\times $|3 factorial design. Two types of wh-expressions were tested: what and the which-NP. These were crossed with three interveners: only-NP, no-NP and most-NP. Six combinations were yielded. The no-which configuration is given in (20) for illustrating.

In order to control for any effect of presentation order, targets and controls were again conducted in separate subject groups. 54 target sentences (9 sets of lexical items * 2 wh-expressions * 3 interveners) were divided into 3 lists with 18 targets per list, and participants who rated the target sentences were divided into 3 groups according to the Latin Square design. 54 semantically identical counterparts with wh-scrambling were included as the control sentences to obtain the baseline acceptability of native speakers. For example, the control sentence corresponding to (20) is (21). A separate 3 groups of participants were recruited as the control group.

9 sets of lexical materials were used, following the same design as in Experiment 1. Generic/habitual sentences were employed to abstract from tense and aspectual specifications, given perfective aspectual marking is potentially incompatible with the negative quantifier meiyou ‘no’-NP.
In addition to the 18 target sentences per participant, 36 fillers (18 grammatical and 18 ungrammatical) were included to mask the purpose of the experiment and additionally to make sure the participants understand the task. The filler sentences in Experiment 2 were taken from those in Experiment 1. Procedure The procedures of Experiment 2 were the same as Experiment 1. The entire experiment, including the background survey and the task, lasted around 10 minutes.
3.2.2 Results
Figure 3 presents the mean ratings of the target (mean: 2.75/5) and control items (mean: 4.25/5) across different conditions. We see that acceptability results from the three pairs of what-conditions were consistent with the results obtained from Experiment 1 (cf. Figure 1). We further see that which-conditions behaved in a converging pattern with what: Target sentences involving the most-quantifier (most-what, most-which) received high mean ratings, tending towards being acceptable, whereas target sentences involving the no-quantifier and the only-phrase (no-what, only-what, no-which, only-which) received low ratings, tending towards being unacceptable.

Mean ratings of target and control sentences across the six conditions in Experiment 2: A rating of 1 corresponds to the judgment ‘very unnatural’ and a rating of 5 corresponds to the judgment ‘very natural’. Error bars show 95% confidence intervals.
The visual pattern thus suggests that what and which-questions are rated very similarly. Our CLMM models based on likelihood tests revealed that the effect of wh-expression (what or which) did not have a significant influence on the ratings of participants (p = 0.92). Our Bayesian analysis based on approximations of leave-one-out cross-validation further revealed that there is no sufficient evidence that the difference between sentences belonging to the which-condition and those of the what-condition was larger than zero (P(|$\delta $| > 0) = 0.58). We conclude that acceptability is not sensitive to the distinction between what versus the which-phrase.
3.3 Experiment 3
We now present the third experiment, which looks into the issue of whether sentence acceptability in Mandarin how-questions depends on the particular readings associated with the how-phrase zenme (see our discussion in section 2.1.2). Two interpretations are relevant: Zenme denotes a set of manners, and alternatively a set of instruments. It is proposed that this interpretational distinction maps to a distinction in the mechanisms where the Q operator binds its wh-restriction (Tsai 1994 and subsequent works). The core assumption is that zenme is bound unselectively under an instrumental reading, whereas the Q operator of zenme must move under a manner reading. Theories that predict a nominal-adverbial difference also predict the two zenmes exhibit different intervention patterns as a result of receiving separate binding mechanisms. To enable a refined evaluation of such theories, we presented identical zenme ‘how’-stimuli in separate subject groups, manipulating the preceding contexts to guarantee a manner reading was obtained for one group and an instrumental reading for the other.14
3.3.1 Methods
Participants 36 participants took part in Experiment 3 (29 females and 7 males) and all remained in the analysis. All participants were college-educated (non-linguistics majors) Northern Mandarin speakers who lived in their place of birth prior to college enrollment. No one spoke another Sinitic language. The average age of participants was 20 years (18 to 27 years, SE = 2.59). Participants received course credit for taking part in the task. Stimuli Experiment 3 followed a 2|$\times $|3 factorial design. Two contexts were tested (instrumental and manner), which were crossed with three types of interveners (most, no, and only). (22a) illustrates a how-question receiving an instrumental interpretation, enforced by the prior context explicitly enumerating alternative instruments. (22b) illustrates a how-question situated in a manner context.

Target sentences followed a Latin square design. 36 target sentences (6 sets of lexical items * 2 contexts * 3 interveners) were divided into 6 lists, to ensure that the same participant did not read both a manner sentence and an instrumental sentence, and did not encounter more than one sentences with the same contextual descriptions. This means, for instance, the same participant would not read both a no-sentence and a corresponding most-sentence. Each participant rated 6 target sentences and 12 filler sentences. The fillers included 6 grammatical items and 6 ungrammatical items (all wh-questions). All fillers were contextualized so as to be consistent with the target setting. Procedures The procedures were the same as Experiment 1 and 2, with the exception that participants were first shown descriptions of a context and the target stimulus was shown on the following page. The entire experiment lasted around 10 minutes.
3.3.2 Results
Figure 4 shows the mean ratings for the target sentences in different contexts.

Mean ratings of the three instrumental-manner comparisons in Experiment 3: A rating of 1 corresponds to the judgment ‘very unnatural’ and a rating of 5 corresponds to the judgment ‘very natural’. Error bars show 95% confidence intervals.
The ratings matched those found in the corresponding how-targets in Experiment 1. Under the no-condition and the only-condition, how-questions exhibited robust intervention effects and their ratings both fell below 3, indicating that situating these target sentences under a context did not lead to the circumvention of intervention.
Our both statistical analyses corroborate the visual pattern. Our CLMM models based on likelihood tests revealed context had no main effect on the rating (p = 0.29), indicating that the acceptability judgment of target sentences was not regulated by instrumental or manner context, but only by the types of intervener. Our Bayesian analysis based on approximations of leave-one-out cross-validation found no sufficient evidence that the difference between sentences in the manner context and those in the instrumental context was larger than zero (P(|$\delta $| > 0) = 0.85). We conclude that acceptability in Mandarin how-questions does not depend on the particular readings associated with the how-phrase zenme.
3.4 Discussion
The study presented three experiments examining how Mandarin speakers rated sentences bearing an intervening configuration, i.e. the configuration where a wh-phrase is c-commanded by a focus-sensitive or a quantificational expression. Experiment 1 tested the effect of wh-expressions and interveners on sentence acceptability. Experiment 2 looked specifically into whether there was an acceptability difference between what and the which-phrase. Experiment 3 investigated whether different interpretations of how had an influence on the ratings.
Results from Experiment 1 revealed a degraded judgment associated with no and only. No degradation was found when most occured. All target sentences with most tended towards being acceptable (mean rating 4.20/5). In contrast, all no-targets (2.23/5) and all only-targets (2.07/5) received a lower than 3 mean rating, and a significant difference was observed between each of these targets and their corresponding controls (no-what, no-whadv, no-why, only-what, only-whadv, only-why).
Within sentences associated with no- and only-NP, for which there was intervention effect, we found no significant difference between what and the wh-adverbial questions, when considering the rating differences between target sentences and their corresponding controls. However, why-questions induced stronger intervention than both wh-adverbial and what-questions.
Finally, no significant difference was found between what-questions with no-quantifier as intervener and those with only-NP as intervener. Similarly, the degradation attested in wh-adverbial questions was not sensitive to whether the intervener was a no-quantifier or an only-NP.
In sum, we did not find an intervention effect under the environment of the most-quantifier. No-NPs and focus phrases induced similar degradation in acceptability, in which case we found no distinction between wh-nominals and wh-adverbials. Why-questions differed from the other two wh-question types: They gave rise to more severe degradation in negative quantifier and focus environments, compared to wh-nominals or non-why adverbials. The experiment has several implications for the theoretical debate about intervention effects.
Our findings in Experiment 1 are not compatible with the prevailing position in the Mandarin literature, according to which wh-intervention in Mandarin is identified with the violation of a locality constraint stating that feature movement cannot take place across a generalized quantifier or a focus-sensitive operator (Aoun & Li, 1993; Kim, 2002; Law, 2001; Soh, 2005; Tsai, 2008; Yang, 2011). Further assuming that feature movement applies to wh-adverbials but not to wh-nominals, this theoretical approach thus predicts that the pattern of intervention exhibits a nominal-adverbial distinction. Alternative to the feature movement approach, intervention effects can be explained without relying on movement (e.g. Beck, 2006; Li & Law, 2016; Mayr, 2013; Pesetsky, 2000; Tomioka, 2007). Such accounts would predict both the base-generation of Q and feature movement give rise to intervention. Given that no significant difference was found between the intervention pattern of nominal and adverbial questions in Experiment 1, our results are consistent with the latter view, which uniformly treats wh-nominals and wh-adverbials. At the same time, the results lead us to conclude that wh-intervention patterns differently from (wh-in-situ) island effects, which observe a nominal-adverbial asymmetry. They thus additionally bear upon the theory of wh-in-situ, regarding the attempts to unify island effects and intervention effects (cf. Cheng, 2009; Cheng & Rooryck, 2000; Huang et al., 2009; Tsai, 1994).
Second, our results from Experiment 1 establish that there are quantificational interveners in Mandarin (and there is variation within quantifiers, which we turn to later). We found that the no-quantifier induces intervention similar to focus phrases, and in particular the generalization applies to wh-nominal and wh-adverbial questions alike. We take our finding to support the claim that quantifier-induced intervention and focus intervention follow the same treatment (Aoun & Li, 1993; Cheng & Rooryck, 2000; Law, 2001; Tsai, 1994, 2008). The converging pattern of intervention has an extra consequence: Some theories have claimed that wh-nominals indeed invite a degraded judgment, but they are limited to focus environments (Hagstrom, 2006; Kim, 2002, 2006; Soh, 2005; Yang, 2009, 2011). It is then argued that wh-nominal questions are subject to a separate, focus-induced effect (Soh, 2005; Yang, 2009, 2011). For instance, Yang (2011) proposes focus phrases are subject to a competition effect in wh-nominal questions. Yang assumes the focus-sensitive operator and the Q operator have the same scope position (due to the shared focus feature), yet the slot cannot accommodate more than one operator in Mandarin Chinese (e.g. Yang, 2011). Quantifiers are not assumed to contain a focus-sensitive operator and take scope separately (Yang, 2011, 62).15 The evidence fails to support this postulation.
Also, our results establish that quantifiers are not a homogeneous class of interveners. Specifically, most-NPs induce no intervention effects (unlike no-NPs). The finding corroborates introspective judgments already given in analyses for Mandarin (Jin, 2019; Ko, 2005; Li & Law, 2016) and across languages (Beck, 1996; Grohmann, 2006; Mayr, 2013). The experiment data thus provide empirical support for these accounts. An underlying idea in these analyses is that certain logical properties of most explain its non-intervener status, while there is disagreement over the exact property that is relevant. For example, Li & Law (2016) predict a divide between strong and weak quantifiers. As no-NPs are ruled out for being a weak quantifier, a distinction between most and no is predicted. Mayr (2013) characterizes quantificational interveners along monotonicity lines. Monotone increasing quantifiers such as most are existential quantifiers by nature and hence additive (cf. also Grohmann, 2006). Yet another line of research, starting with Beck (1996), argues that most may denote a group individual, hence should be analyzed as a plural indefinite (cf. also Jin, 2019; Pafel, 1991). To decide between these analyses, more quantifier types need to be tested. We leave an experimental investigation towards a proper delineation of quantificational interveners to another paper.
An important further finding is that why-questions are more degraded than the other wh-questions in the presence of a c-commanding no/only. We propose to explain why’s degradation in terms of a different constraint (Jin, 2019; Ko, 2005; Li, 2011). This explanation is crucially based on the well established observation that the why-adverb across languages (including East Asian languages) receives a no-trace construal, i.e. its base position is the position where it takes scope, assumed to be the Q operator’s scope position (Bromberger, 1992; Lawler, 1971; Soare, 2021; Stepanov & Tsai, 2008; Tomioka, 2009). In Mandarin, it is specifically proposed that the why-phrase is base-generated, residing in the C domain instead of the T domain unlike run-of-the-mill wh-adjuncts. As a result, why is above the scope of the focus-sensitive operator or the generalized quantifier (Li, 2011). The structure where no or only c-commands why is impossible, as no/only and their operators must stay within the domain of the Q operator. Adopting this view, we assume that the reason why-questions give rise to a different pattern of judgment than the rest of wh-questions is because the structure of no/only c-commanding why is ruled out by a distinct mechanism (cf. Jin, 2022).
Results from Experiment 2 showed that both what-questions and which-questions exhibited intervention (barring cases with most). The evidence is again compatible with theories that predict no nominal-adverbial difference. We may assume that the which-phrase is bound unselectively like what, or alternatively it undergoes feature movement as suggested in Pesetsky (2000). In either case, intervention is expected to arise.16
Recall that in section 2.1 we have shown covert phrasal movement gives rise to a non-intervening structure and derives an unproblematic compositional interpretation in a similar manner to overt scrambling. Given no distinction is found between what and which, there is no evidence indicating that the mechanism of LF phrasal movement applies to what (a possibility suggested in Beck 2006 based on a small-scale informal survey). More generally, the pattern in our experiments is that both wh-nominals and adverbials are sensitive to interveners. Hence we conclude that covert phrasal movement is not available to Mandarin wh-in-situ.17
In Experiment 3, we did not observe an effect of context on acceptability ratings. As in the previous two experiments, interveners regulated the judgment of target sentences. Most-NP sentences were rated significantly higher than sentences with a no-NP and an only-NP. The judgment is not sensitive to whether the context enforces a manner reading or an instrumental one. Assuming a category-based taxonomy of nominal phrases, the approach of nominal-adverbal asymmetry treats a manner-denoting how-question under an intervening configuration as problematic, while assigning a legitimate structure to the instrumental reading one. Thus, the resulting prediction of a manner-instrumental distinction is incompatible with our experimental finding. On the other hand, the accounts that uniformly treat wh-nominals and wh-adverbials correctly capture our experimental finding: We may still adopt the categorial classification and distinguish the manner how against the instrumental how. Intervention is then derived for both categories.
4 CONCLUSION
In our study, comparisons between wh-nominals and wh-adverbials in intervening configurations (i.e. with a c-commanding quantifier or focus phrase) reveal no significant difference in acceptability. This result holds when a fine-grained distinction within wh-nominals and wh-adverbials, as well as the particular readings obtained, are considered. The evidence thus challenges the long-standing notion that non-movement mechanisms such as unselective binding are pivotal to understanding the pattern of intervention in Mandarin wh-in-situ. It is compatible with theories that are insensitive to the interpretive differences between nominals and adverbs (e.g. Beck, 2006; Pesetsky, 2000).
An issue that remains to be explained is why the degradation found in why-questions is more severe than that in the other wh-questions. One potential direction to look at the issue is to understand the intervention phenomenon in terms of a cognitive constraint during sentence processing. We assume that the real-time comprehension process underlying association with focus is a process of dependency resolution (Beck & Vasishth, 2009), which can be characterized as a retrieval process (Lewis & Vasishth, 2005; Vasishth & Lewis, 2006). The parser encounters a focus phrase and seeks to retrieve a prior focus-sensitive operator as target. The success of a retrieval process depends on the available cognitive resources. Consider now the processing of the configuration [Op1...Op2...F2...F1], which involves the association of two focus phrases with their respective operators. Upon encountering F2, the parser searches backwards to retrieve its immediate c-commanding focus operator Op2. At the same time of this process of dependency resolution, additional resources need to be expended on maintaining a certain activation level (e.g. above the threshold of memory decay) for the higher operator Op1. This would allow Op1 to be retrieved later, when the second focus phrase F1 is encountered, in order for the dependency resolution to be completed. This process of simultaneously resolving one dependency and holding a second operator in memory imposes a taxing burden on the available resources and results in greater retrieval difficulty.
If a processing understanding of the intervention effect is feasible, then retrieval difficulty should be subject to individual cognitive (e.g. working memory) differences, and there should be cases of successful retrieval. We suggest this prediction allows us to potentially account for the pattern where the intervention effect gives rise to less degraded judgment than the constraint pertaining to why-questions. The latter effect involves an impossible structure, which should not be violable. Exploring such an explanation falls beyond the scope of this paper, but it opens room for future empirical tests.
Funding
This work was funded by The Philosophy and Social Sciences Office of the Government of Shanghai (Grant No. 2021BYY007).
Acknowledgements
We wish to express our deep gratitude to the editor Jakub Dotlačil and the two anonymous reviewers of Journal of Semantics for their detailed, comprehensive and insightful comments. The paper is much improved because of them. We are further indebted to Ting Ma, Yifan Wang, Jun Chen, Chengru Dong and Yu-Fu Chien for their invaluable feedback and assistance. As usual, all the remaining errors are our own.
Footnotes
The following abbreviations are used in the glossing: CLF: classifier; COP: copula; DEM: demonstrative; EXP: experiential aspect; LOC: locative; NEG: negative, negation; POSS: possessive; PRF: perfect; PRT: particle.
A very relevant discussion is provided in Tomioka (2007), which reports on the vast speaker variation when it comes to obtaining grammaticality judgments on intervention effects. Tomioka shows while there is consensus that the scrambled version is better than the unscrambled counterpart, speakers differ greatly across individual intervener types and wh-phrase types (see also Tomioka 2009). He further points out the methodological problem posed by the lack of agreed pattern of intervention (Tomioka, 2007, 1572-1573): “...the existing analyses take the steps that are not uncommon when we encounter messy judgments: Make certain decisions (what is grammatical and what is not) based on one’s own judgment, and proceed to theorization. Disagreement in grammatical judgments is often noted but it is rarely treated as a target of explanation.”
We use italics like why, how, what in an abstract way, referring to both the English wh-expressions and their counterparts in other languages.
Note here we did not exhaust the space of possible theories that can be employed to explain the pattern we are investigating. Our goal is to test the prediction of a nominal-adverbial asymmetry. In the case said prediction fails to hold, deciding on the best explanation among all available alternatives would go beyond the scope of the current study. This being said, we want to be clear that there are a plethora of approaches that predict no difference between nominal and adverbial wh-phrases, which we did not all cover here. More generally, the prediction of nominal-adverbial asymmetry is based on two independent assumptions: (1) Nominals and adverbials receive distinct binding mechanisms; (2) Intervention is induced when a condition on movement is violated. In principle, a different prediction is yielded under any account that does not simultaneously observe both assumptions. This could be the case with, say, assuming that the Q operator binds into its variables in the same way regardless of the nominal status of the wh-phrase (e.g. Huang, 1982).
Here we assume with Beck (2006) (see also Jacobs (1983) and Büring & Hartmann (2001)) that only adjoins to verbal projections and extended verbal projections (clausal nodes). This clausal status applies also to the so-called ‘constituent only’, which apparently adjoins to a DP.
While we use Beck (2006) as a representative account, there are alternative proposals that can equally do the job. For example, in another focus evaluation account with broader empirical coverage, Li & Law (2016) argues that with the focus semantic contribution of the focus-indexed DP and the set of alternatives contributed by wh, the composition of the two gives rise to a set of sets of alternatives. Now the focus-sensitive operator only above the verbal projection needs to make use of the focus semantic value of the verbal projection. However, this results in a type problem, as only cannot evaluate a set of sets of alternative properties. An interpretation failure ensues. A plethora of other proposals similarly derive an interpretation problem when an offending operator is closer to wh than Q (e.g. Cable, 2010; Haida, 2007; Kotek, 2017; Mayr, 2013), either because it results in triviality (Haida, 2007), compositional failure (Cable, 2010; Kotek, 2017) or a presupposition violation (Mayr, 2013). Thus, they predict that intervention shall arise, regardless of whether the Q operator moves to its LF position or is merged there directly. In both cases, structures such as (6) are the input to compositional interpretation.
Beck (2006) assumes that the variable that replaces the focussed expression is a distinguished variable. Wh-phrases are also distinguished variables, whereas traces are not. Distinguished variables are assigned a focus semantic value by a separate assignment function (in addition to receiving an ordinary semantic value from a regular assignment function).
The suggestion is based on Pesetsky (2000)’s solution for the above-mentioned contrast in English between which vs. what in intervening and superiority-violating environments. Note Beck (2006) cautioned that the reported pattern of distinction between what and which in English remains controversial. It is also not clear how to explain the differences between the two phrases based on their semantics. See Beck (2006, 27-29) for more discussion.
In the experiment one person sees 3 (out of 9) conditions, as specified in Table 1. We did not opt for a complete percolation, because our experiment includes both targets and controls and when taken together, comprises 18 conditions (9 conditions with an intervening configuration and 9 conditions of control stimuli). A full Latin square design thus involves 18 groups of participants, which we feel is a bit less practical in actual implementation than our current treatment.
The literature notices that speakers may reject verbal -le in the environment of negation words like mei or bu. Some speakers I consulted further reject the co-occurrence of verbal -le with negative subject quantifiers like meiyouren ’nobody’. As one anonymous reviewer points out, many speakers are nevertheless fine with such sentences. We are excluding verbal le-sentences in our experiment design out of an abundance of caution, given the likelihood that this sentence type might elicit degraded judgment for a subset of speakers for a reason independent of the intervening configuration.
Specifically, the control sentences with what triggered low acceptability. We note that the what-questions within this lexical set involve a monosyllabic verb at the final position, preceded by a noun phrase (e.g. a no-phrase).

Corresponding sentences from the other lexical sets contain disyllabic verbs (e.g. ai-chi ‘like eat’) and tend to be rated much better. The low acceptability with monosyllabic sentence-final verbs could be due to an idiosyncratic constraint in Mandarin that is tangential to our purpose. Studies on the prosodic structure of Mandarin have assumed that a monosyllabic word is unfooted, hence it joins an adjacent foot in order to form a prosodic word unit (Chen, 2000). In the case where a monosyllabic verb is sentence-final, it needs to join preceding syllables, as there are no following syllables. There is however an independent tendency against an NV combination within the same prosodic word unit. The favored combinations would be ones where the verb is preceded by a modal or a PP, etc. (e.g. Cao, 2003; Cao & Zhu, 2002). A disyllabic verb is unproblematic, as it forms its own foot and does not join the preceding foot.
In the context of our statistical analysis, group refers to different types of stimuli (target and control). A distinction is to be made against our previous mention of Latin square groups.
Here we treated the whadv condition as uniform, as our analysis revealed no difference between how and where.
The literature has discussed another how-form in Mandarin zen(me)yang (Lin, 1992; Stepanov & Tsai, 2008; Tsai, 1994, 2008). Our experiment did not use zen(me)yang because we found our participants did not treat it as natural, in non-intervening configurations. A majority of participants considered the following configuration, where zen(me)yang is c-commanded by a bare plural subject, as degraded.

Participants reported that (2) is acceptable.

Those who can accept (1) still prefer using (2). We suspect a dialectal difference might underlie the contrast. Suffice it to say that there is one dialect of (Northern) Mandarin that consistently employs zenme as the how-form. Importantly, when proper contexts were provided, participants can obtain both a manner reading and an instrumental reading for the zenme-question (cf. Yang, 2011). For this reason we used zenme-questions consistently in Experiment 3.
Besides not compatible with the empirical findings presented in this study, a competition effect faces additional conceptual challenges. If the focus operator and Q compete for the same slot, then it is unclear how to explain the fact that wh-phrases can circumvent intervention by scrambling to the left of focus phrases. It seems there is a landing site for scrambled wh-phrases that does not compete with the focus operator. If this is the case, then it would be mysterious why Q cannot make use of this site when moving at LF.
Proponents of a quantifier-focus divide in wh-nominal questions additionally claimed that focus DPs differ from focus adverbs. The constituent only phrase is an intervener, whereas the adverb only induces no intervention (Soh, 2005; Yang, 2009, 2011). We leave it to future work to test whether a fine-grained distinction is found within focus expressions.
An account that predicts the nominal-adverbial asymmetry runs into difficulties in view of the data. If we follow the default understanding and assume that the nominal which-phrase is unselectively bound like what, then it is predicted to be acceptable. Alternatively, if we allow that which undergoes feature movement, then it should give rise to more degraded judgment than what. Both predictions fail to be borne out by our findings.
Previous analyses of intervention such as Soh (2005) have relied on the assumption that wh-nominal phrases move at LF. Aside from not receiving empirical support, postulating covert phrasal movement in Mandarin faces additional challenges: Phrasal movement is standardly diagnosed using antecedent-contained deletion (ACD) (Pesetsky, 2000). However, ACD tests yield inconclusive results in Mandarin, as their application is limited to very few verb types (Cheng & Rooryck, 2000; Yang, 2011). Moreover, even where ACD has been shown to apply, a difference in acceptability between wh-nominals and adverbials cannot be clearly established.
References