Corpus-Aided Language Learning for Chinese EFL Learners: An Exploratory Study

Effective language instruction is essential for ESL/EFL students’ language development and improvement. Language researchers, educators, and professionals have investigated the preliminary impact of input that has been purposefully maneuvered to implement language instruction. It seems that effective language instruction has been explored from teachers' perspectives. However, learners’ perceptions of language learning seem to be under-researched. For this reason, this study aims at exploring EFL learners’ perceptions of corpus-aided instruction through qualitative research. Thirty-seven Chinese EFL college students at a Midwestern university in the United States participated in this study. Writing conferences and interviews were collected and analyzed through thematic analysis. Findings showed that the Chinese EFL learners felt corpus-aided instruction was helpful in terms of two things: (1) clarifying logic and (2) organizing the structure in academic writing. However, they also reported some challenges in corpus-aided instruction. This study offers new insight into the usefulness of corpus-aided instruction by drawing much-needed attention to EFL learners' L2 writing development and improvement. Based on the preliminary findings, suggestions and implications are discussed.


Introduction
Acquiring the second language (L2) academic writing skills may be a daunting task for EFL and/or ESL learners, particularly in terms of English as the medium of instruction and practice. Developing rhetorical and grammatical skills are essential for quality writing. For these reasons, researchers in the field of linguistics and education have turned their attention to phrases for effective L2 instruction. Systemic Functional Linguistics (SFL) by Halliday (2004) is the major theoretical framework in this study. SFL is an approach to linguistics that regards language as a system of social functions, representing linguistic resources for interpreting meaning through words and structures (Haliday & Matthiessen, 2004). Adapting to a rhetorical style is important for gaining membership in academia. It is obvious that teaching academic vocabulary will help EFL learners improve their language proficiency in general, ultimately enhancing their academic writing skills and increasing confidence in writing in English. In this regard, corporal linguistics is an essential field for investigating various linguistic features of vocabulary. Corpus linguistics synthesizes various common word combinations (e.g., lexical clusters and collocations). An early study by Johns (1994) used "data-driven learning" (p. 296) to revolutionize language learning in technical and methodological terms by utilizing machinereadable texts from corpora. In light of beliefs and findings about second language acquisition and learning, this study attempts to extend the scope of L2 writing by exploring EFL learners' perceptions of corpus-aided instruction as the development of flexible meaning-making language skills across contexts. In addition, this study integrates the theoretical and pedagogical fragments of SFL, corpus linguistics, and L2 writing as basic concepts to form an "interdisciplinary framework for SLA" (The Douglas Fir Group, 2016, p. 19). Therefore, the purpose of this study is to explore EFL learners' perceptions of corpus-aided instruction. The overarching research question guided this study: How did Chinese EFL learners perceive corpus-aided instruction?

Research Questions
To fulfill the purpose of the study, The overarching research question guided this study: How did Chinese EFL learners perceive corpus-aided instruction?

Systemic Functional Linguistics (SFL)
Language is a system, and a variety of language combinations are yielded through the representation tool of the "system networks" (Halliday, 2004, xiii). Systemic Functional Linguistics (SFL) is an approach to linguistics that regards language as a social semiotic system; that is, the social action of meaning-making by functions and purposes in context. Halliday (2004) claims that a central theoretical principle is grounded on any act of communication involving choices. From the SFL perspective, language has evolved under the pressure of the particular functions that the language system has to serve. Halliday (2004) explains the basic functions of language by making sense of human experience and acting out social relationships, which is to be achieved via metafunctions. Hence, human experience is transformed into a meaning of language. Specifically, language endows a theory of human experiences, and lexicogrammar, the continuity between grammar and lexis, a unique term to SFL coined by Halliday (1961), of every language is dedicated to that function. Halliday (2004) categorizes metafunctions into three perspectives of ideational, interpersonal, and textual. The ideational function is divided into the experiential and logical. If the ideational function of grammar is performed as an action, it serves as the interpersonal metafunction of either interactive or personal. The last component is textual metafunction, a mode of meaning relating to the construction of the text. Halliday (2004) specifies that the textual metafunction appears as a clearly delineated motif within the grammar because ideal and interpersonal metafunctions rely on the ability to build up sequences of discourse, organize the discursive flow, and create cohesion and continuity (pp. 30-31). Halliday (2004) maintains that all languages involve the three metafunctions: one interprets experience; one enacts social relations, and one intertwines together of these two functions to create text. Since the three metafunctions are considered to come into being simultaneous, language must also be able to bring these meanings together. This is the role of structural organization with three notions: grammatical, semantic, and contextual. Halliday also argued that the textual metafunction is distinct from both ideational and interpersonal because language creates a semiotic word of its own through the textual metafunction. Halliday's notion of lexicogrammar has been rooted in the area of phraseology and functional linguistics. SFL offers the prospect of looking at the essence of second language learning as the development of flexible meaning-making second language capacities across contexts.
Lexicogrammar is the system of wording, representing the linguistic resources for construing meanings through words and structures (Halliday, 2004). The lexicogrammatical approach was adopted by the proponents of systemic functional grammar (Halliday & Matthiessen, 2004), encompassing a much broader set of phenomena in mainstream lexicology. Ngo and Luu's (2022) study employed lexicogrammar as one of the theoretical frameworks in their study in order to examine lexicogrammatical realizations in EFL learners' English conversations. Lexicogrammatical analyses vary with two components: paradigmatic and syntagmatic. By crosstabulating these two components, four basic combinations are derived: syntagmatic & lexical, syntagmatic & grammatical, paradigmatic &lexical, and paradigmatic & grammatical. The syntagmatic & lexical feature focuses on the tendency for words to involve others in their immediate vicinities, such as collocations and lexical bundles. With the theoretical perspective of SFL, this study used Halliday's lexico-grammar, a combination of vocabulary and grammar, as a major notion.

Corpus linguistics
Corpus linguistics has provided a variety of potential research investigations with regard to linguistic features, such as vocabulary, semantic domains, and grammatical structures. Through corpus-based investigations, many language researchers (e.g., Biber, 2006;Biber et al., 1999;Conrad, 1999;Cortes, 2004Cortes, , 2006Cortes, , 2013 rigorously investigated the co-occurrence of seemingly similar structures and patterns (e.g., in the [Noun Phrase] of), serving different functions in different contexts. Corpus data is recognized as valuable for gaining knowledge of language patterning and perspectives on the language system (Biber & Conrad, 2001;Hunston, 2002;Sinclair, 1991;Stubbs, 2001). Corpora are useful to both students and teachers in education. Students can get many benefits from corpus-based instruction because corpora can provide usage-based information in the form of concordances, frequency, distribution, and collocation (Boulton, 2010). A few studies focus on teachers' perspectives with an emphasis on teacher education. Yoon and Hirvela (2004) and Yoon's (2008) corpus-based studies reveal that students gained confidence in writing and lexicogrammatical awareness. Yoon and Jo's (2014) recent case study also showed that the needs-based approach (i.e., the instruction modified based on the students' needs) to corpus use in L2 writing was useful in guiding learners to restructure their errant knowledge in a language. Sinclair (2004) indicates two corpus developments of the language teaching profession for classroom use: the teaching of lexical and phraseological patterns (McCarthy & Carter, 2004;Schmitt, 2004) and secondly language variety, genre, and register (Poole, 2016;Reppen et al., 2002). Hunston (2002) asserts that language classroom teachers should encourage students to explore corpora for themselves by 335 comparing language features. A few studies focused on teachers' perspectives emphasizing teacher education. O'Keeffee and Farr (2003) claim that teachers play an essential role in recontextualizing corpora and mediating between corpus-based instruction and the needs of the learners in the classroom. Farr's (2008) study provided pedagogical advice on corpus use in the classroom and showed the student teachers' positive predisposition towards the use of corpora. Poole's (2016) study demonstrates a corpus-aided approach for the teaching and learning of rhetoric in the L2 undergraduate writing course to examine linguistic and rhetorical variation. As shown in the prior literature, corpus-based and corpus-aided writing instruction have the potential to improve L2 learners' academic writing in terms of a variety of components, such as linguistic features, genre, and rhetoric.

The context & participants
The current study was conducted in the English as a Second Language (ESL) Program housed at a Midwestern university in the United States. The participants were Chinese EFL college students who were required to take appropriate writing courses based on their placement test results. Their age rank is from 18 to 24. We met twice a week, with each class session running 80 minutes. The total number of participants in this study was 37 students. Their participation was voluntary, and students who did not consent were not included. The students participated in various types of qualitative data gathering, such as writing conferences, interviews, and member checking. All the participants joined one-to-one writing conferences with their instructor to discuss their final papers to complete the writing course. Among them, 15 students participated in the interviews to discover how corpus-aided instruction influenced their learning of lexicogrammatical components.

Data collection
One-to-one writing conferences One-to-one writing conferences were held per participant in a total of 37 writing conferences. Each writing conference lasted 30 minutes on average and was audio-recorded and transcribed verbatim. A common feature of the writing conferences was that the instructor and students held one-to-one conferences about their own drafts. Writing conferences enable the L2 learners to participate in their learning about writing and to view their L2 writing development (Ewert, 2009). On the one hand, the conferences were aimed at helping students improve their writing for the course, and so enhance their development. On the other hand, as Soter and Smith (2016) explain, a dialogue about writing, such as through the writing conference, can encourage the transfer of skills learned in one setting to new academic and professional contexts and further improve students' writing skills. Teachers need to "create learning environments and assignments that explicitly encourage transfer by cultivating skills and habits of mind that best aid their transition" (Soter & Smith, 2016, p. 3). Writing conferences can help students make the transition from basic academic writing to field-specific writing. In the writing conferences, think-back questions were asked to help the participants reflect on their learning experiences with corpus-aided instruction. Such questions are useful in obtaining specific information about prior experiences. Krueger and Casey (2015) maintain that focusing on the past increases the reliability of participants' responses because think-back questions allow them to concentrate on what they have done as opposed to what can be done in the future.

Semi-structured interviews
Semi-structured interviews were conducted once or twice with the individual participants, depending on their schedules and wiliness to participate. A total of twenty interviews were performed within a range of 20 to 100 minutes (the average was 60 minutes). Each interview was audio-recorded and transcribed verbatim. The semi-structured interviews were essential in understanding how the learners perceive corpus-aided instruction.
To ensure both open-ended possibilities and the gathering of information necessary for the study, a semi-structured approach was adopted. A semi-structured interview is considered most valuable when the researcher understands the fundamentals of a research context from the insider's point of view (Fetterman, 2009). The interview contained open-ended questions, thus encouraging the participants to freely discuss personal learning experiences (Corbin & Strauss, 2008). They also contained focused questions tied to the student's experience of corpus-aided instruction. The overall style was conversational so as to help participants relax and thus produce more valuable data. The interviews asked about the effectiveness of the corpus-aided instruction (e.g., "What was the most useful instruction in the corpus-aided instruction?", "What were the most challenging things in academic writing?"). The interviews helped the researcher to understand the participants' perceptions and some challenges they encountered.

Data analysis
All the collected data from the interviews and writing conferences were analyzed to identify recurring patterns or themes through thematic analysis. The thematic analysis enables researchers to identify, analyze, and report patterns or themes within the data (Braun & Clarke, 2006). The data were examined through Braun and Clarke's (2006) six phases of conducting a thematic analysis: (1) becoming familiar with the data, (2) generating initial codes, (3) searching for themes, (4) reviewing themes, (5) defining and naming themes, and (6) producing the report. Constant comparative thematic analysis was especially important during this phase of data analysis. The analysis applies specific techniques, including open coding to break down the data as an initial step, axial coding to define the concepts and categories extracted from the open coding, and selective coding to deliberately select one aspect as a core category (Corbin & Strauss, 2008;Punch, 1998;Strauss & Corbin, 1998). Open coding (a.k.a. "microanalysis", Corbin & Strauss, 2008, p. 58) breaks data apart, generates abstract conceptual categories, and labels them with substantive codes. Axial coding is used to group the discrete codes according to conceptual categories and interconnect these categories and produce a set of propositions with theoretical codes (Corbin & Strauss, 2008). Axial coding consists of the condition that gives rise to the data, the context into which it is embedded, the action strategies in which it is carried out, and the consequences of the strategies. Selective coding is to integrate a developing analysis and has a central focus on core codes (Corbin & Strauss, 2008;Punch, 1998;Strauss & Corbin, 1998).

EFL learners' overall perceptions of corpus-aided instruction
The main purpose of the thematic analyses was to shed light on students' experience of corpusaided instruction. The Chinese EFL learners felt corpus-aided instruction was helpful in terms of two things: (1) clarifying logic and (2) organizing the structure in academic writing. What follows are student comments from interviews and writing conferences relative to these areas: (1) Clarifying logic

I used the expressions in Source Paper 2, such as 'on the other hand.' I tried to find the words and use them in my paper. I think my paper usually doesn't have logic, so it (the practice of using the words) was good. I think I can use the words in the sentences to have logic in the academic paper. (Student#6, interview)
(2) Organizing the structure in academic writing I found the expressions helpful, such as 'is one of the' and 'are more likely to. I used 'are more likely to in my paper to indicate something is not guaranteed in the structure. They are good for the structure. (Student#13, writing conference)

Multiword phrases are helpful [in organizing and making] a structure in my writing. That's why I used 'as a result because one is the reason, and another is the result. (Student#14, writing conference)
I look at the list of the expressions to match the structure for grammar. (Student#15, interview) Most participants perceived the importance of receiving corpus-aided instruction and wanted to have more practice and exposure to corpus data. Specific comments on these themes appear below:

I found the corpus-aided instruction is interesting and want to have more lessons with it (corpus-aided instruction. (Student#33, interview)
Corpus-aided instruction was helpful to me. But in my case, I should use and practice more and more so that I become more familiar with the given phrases. (Studentl#22 Several participants addressed the need for more practice. The students held a common belief that practice makes their writing better, so they felt this tenet should also be applied to corpusaided instruction. Thus, they advocated explicit corpus-aided instruction as they saw value in such input.
In terms of their preferences, the major theme that emerged in several students' interviews and writing conference comments was the importance of providing sufficient usage examples and explanations of lexicogrammatical components in corpus-aided instruction. For example, one participant suggests: If the instructor prepares sentences with multiword phrases, students match them appropriately. Also, you can briefly introduce multiword phrases with the meaning of each phrase at the beginning (Student#28, interview).

Another participant offered some suggestions emphasizing details of corpus-aided instruction:
Give specific instruction on multiword phrases that we are going to learn before getting to use them. It will be useful if you include the materials for explicit corpus-aided instruction (Student#27, interview).

Student#33 offered several instructional recommendations:
I think it's worthy and important to teach multiword phrases. I think you should include a list of multiword phrases in the lecture and make students familiar with them. Try to test whether they understand the use of multiword phrases. This (test) makes students pay more attention to the multiword phrases. Although I don't like tests, the test can improve the use of multiword phrases because they are important in academic writing. (Student#33,interview) Collectively, these suggestions once again indicate that the participants saw value in language learning through corpus-aided instruction, and they preferred explicit instruction with plenty of corpus examples.

EFL learners' challenges regarding corpus-aided instruction
While the data revealed various positive responses about corpus-aided instruction, several participants expressed that they experienced some challenges in learning. They noted: The difficult thing is the variety (of using multiword phrases). I want to use different multiword phrases, but it's difficult to remember them… It's very challenging for me to write with different multiword phrases. Maybe, they have different styles (functions) for me to use. I don't know why, but I easily and quickly forget the information of multiword 339 phrases… Also, I make the same mistake. (Student#28, interview)

I used "what's more" a lot in my paper because I can't remember the multiword phrases. They don't automatically come to my mind… I am not really sure how to use another multiword phrase [that] can be added to my paper. (Student#33, interview)
It's difficult to remember and find the multiword phrases to rephrase or paraphrase. (Student#20, writing conference) I'm not sure how to use multiple words. That's why I can't choose the correct words. (Semi-intentional#25, writing conference) I don't remember the multiword phrases, so I didn't use them. (Student#30, writing conference) In fact, the participants tried to use a variety of multiword phrases learned in the instruction, produce a logical paper, and articulate ideas fluently. However, the application process was not an easy one. Several students (Student#20, #30, #33) were confused about the meaning of the acquired phrases through corpus-aided instruction, which led them to easily forget them due to a lack of understanding. Cortes (2004) argues that one reason why L2 students may avoid using prefabricated phrases that are quickly and easily ready for use is that "students do not dare to risk the chance of making mistakes by using these expressions, which are unfamiliar or may convey different functions when used in academic prose" (p. 421). This shows that the participants wanted more detailed or concrete instruction and more opportunities to practice what they were being taught. On the whole, it can be said that most challenges they encountered emerged from a lack of understanding of the target phrases in the language instruction. Some other comments also indicated challenges and difficulties associated with using multiword phrases in their academic writing: I know the meaning of the words, but I don't know how to make them useful in academic papers. I need more time to understand how to use them. (Student#1, interview) Multiword phrases I learned connect two sentences and make the paragraph more logical. But, it is not very easy to remember and use them, especially when I need to write something. (Student#2, writing conferences) Two patterns emerged from the collection of the students' comments concerning challenges: (1) concerns about the use of acquired multiword phrases and (2) perceived deficiencies in their second language knowledge that prevented applications of multiword phrases. Regarding the first pattern, several students (Student#1, #10, #12, & #18) expressed their concerns about using multiword phrases in an effective way. They recognized their value in academic writing but doubted their ability to employ them successfully. This pattern may overlap with the second one, as some students (Student#4 & #21) attributed not using formulaic expressions in their papers to their deficiency or incompetence as foreign language learners.
While displaying positive results, the qualitative data also showed that student uptake from corpus-aided instruction is a complex matter. The participants' responses suggest that students need time and meaningful practice opportunities in order to process their language learning fully and thus create productivity. Teachers need to give considerable thought to how to offer exposure to such useful expressions in order for students to instantiate that input. Bui (2021) maintains that explicit instruction of multiword phrases (e.g., collocations) is essential for EFL learners. Well-designed activities that allow students to practice language use appear to be especially important for successful language learning and development. Flowerdew (2015) addresses that corpus-based research and pedagogy with respect to both lexis and genre illustrate the tight links between corpus research and pedagogical applications of that research. While analyses of corpora are important, knowing how corpus materials can be used to promote learning is equally important. From the genre perspective, this includes using corpus input to perform various rhetorical moves that organize texts and allow them to reveal meaning clearly and systematically relative to the purpose of the genre involved (e.g., literature review, research report, and abstract). Corpus linguists and researchers (Cortes, 2013;Farr & O'Keeffe, 2019;Flowerdew, 2015;Hyland, 2008;Kanoksilapatham, 2005;Kashiha, 2015;Lin & Kuo, 2014) see lexical phrases as a move-signaling entity. That is, certain combinations of words can be used to help readers see and understand the moves being made by writers, such as with a phrase like "on the other hand" to introduce a paragraph presenting information or a view contrary to what appeared in the previous paragraph. This perspective enables researchers to investigate writing components from the word level (i.e., multiword phrases) to the textual level (i.e., rhetorical moves in a genre).

Discussion
Corpus-aided instruction has been found to be advantageous for the acquisition of common usage patterns of words and phrases and for the improvement of academic writing skills (Boulton, 2010;Farr & O'Keeffe, 2019;Lin, 2016;Poole, 2020;Yoon & Hirvela, 2004). Farr (2008) indicates that students show a positive disposition towards corpus use, which contributes to the students' language awareness. Lin's (2016) corpus-aided study also shows that corpus instruction enhanced the learning attitudes of EFL learners. In this current study, the participants responded that multiword phrases from corpus-aided instruction are relevant and important components in academic writing. As L2 learners are more exposed to corpus-aided instruction, they may become aware of essential components of language and be encouraged to learn a foreign language more systematically. Farr and O'Keeffe (2019) also addressed that corpus materials enable us to see the grammar that is learned in class being used in the real context. Therefore, this study supports the claim that corpus-aided instruction fosters learners' language awareness and motivation to learn academic writing within the linguistic domain. To elaborate, the participants in this study indicated that corpus-aided instruction helped clarify logic and organize the structure in their academic research papers for the ESL writing course. The participants' perceptions of logic and structures of using multiword phrases in academic writing make sense with long-term language learning.
On the other hand, it should be noted that several participants expressed uncertain or vague views of their language learning in corpus-aided instruction by saying less impact on their retention of vocabulary knowledge, thus limiting the value of this inexplicit form of instruction. These fewer positive views could be due to the students' perceptions of vocabulary constituents. Researchers generally agree that learning the vocabulary of a foreign language may involve a slow and complicated acquisition process because L2 learners first need to understand the lexis and memorize the new words, and they need repeated exposure to the target words. Learners may learn words incidentally (Nagy et al., 1985) or intentionally (Laufer, 2003). This current study showed that there were perceived challenges regarding the form of corpus-aided instruction. As previously reported, some students indicated that explicit and clear corpus-aided instruction is necessary for effective language learning and use.

Conclusion
In closing, this study has shed new light on the usefulness of corpus-aided instruction by drawing much-needed attention to EFL learners' writing development and improvement. Despite some challenges the participants encountered, they applied the acquired expressions through corpus-aided instruction to their academic writing. This study discovered the potential benefits of using corpus-aided instruction in language learning and teaching.
This study contained several limitations that need to be addressed for future research. The first limitation of this study was generalizability issues (Shadish et al., 2002). The study did not employ random selection when it comes to the sampling issue. Small sample size would also lead to making the findings non-generalizable. However, Clarke (1995) argues that classroom researchers should strive for particular ability by supporting teachers through connections between the particular events of participants' lives and the findings of the research.
All in all, it is essential to expand the scope of corpus-aided instruction to develop and enhance EFL learners' academic writing. The essence of corpus-aided instruction is to provide linguistic resources that are beneficial to second language learners' academic writing skills. It is hoped that the findings of this study are of interest and value to scholars and teachers working in areas such as corpus linguistics, EFL learners, and second language writing.

Eunjeong Park is an Assistant Professor in the Department of English Language Education,
College of Education at Sunchon National University, South Korea. Her research interest includes language learning in the EFL context and the interdisciplinarity of teaching and learning in education.