Pretest Review:先期測試評估
The Validation Unit at Cambridge ESOL collates and analyses the pretest material.
Listening and Reading pretests
All candidate responses are analysed to establish the technical measurement characteristics of the material, i.e. to find out how difficult the items are, and how they distinguish between stronger and weaker candidates. Both classical item statistics and latent trait models are used in order to evaluate the effectiveness of the material. Classical item statistics are used to identify the performance of a particular pretest in terms of the facility and discrimination of the items in relation to the sample that was used. Rasch analysis is used to locate items on the IELTS common scale of difficulty. In addition, the comments on the material by the staff at pretest centres and the immediate response of the pretest candidates are taken into account.
At a pretest review meeting, the statistics, feedback from candidates and teachers and any additional information are reviewed and informed decisions are made on whether texts and items can be accepted for construction into potential live versions. Material is then stored in an item bank to await test construction.
Writing and Speaking pretests
Separate batches of Writing pretest scripts are marked by IELTS Principal Examiners and Assistant Principal Examiners. At least two reports on the task performance and its suitability for inclusion in live versions are produced. On the basis of these reports, tasks may be banked for live use, amended and sent for further pretesting or rejected.
Feedback on the trialling of the Speaking tasks is reviewed by experienced examiners, who deliver the trialling tasks, and members of the item writing team who are present at the trialling sessions. The subsequent reports are then assessed by the paper chair and Cambridge ESOL staff.
解讀:在進(jìn)行先期測試采集分析數(shù)據(jù)的基礎(chǔ)上,劍橋考試委員會試題復(fù)核小組(Validation Unit)將匯總各項測試數(shù)據(jù)進(jìn)行分析。所有考生的回答都將進(jìn)行分析以確認(rèn)試題的技術(shù)特征——也即試題的難度及區(qū)分度——而這將用到一系列專業(yè)的數(shù)據(jù)分析統(tǒng)計方法和工具。通過驗(yàn)證的試題則將被收入題庫,用于將來正式的考試。
Banking of Material: 試題入庫
Cambridge ESOL has developed its own item banking software for managing the development of new live tests. Each section or task is banked with statistical information as well as comprehensive content description. This information is used to ensure that the tests that are constructed have the required content coverage and the appropriate level of difficulty.
解讀:劍橋大學(xué)考試委員擁有自己專門的題庫軟件用于管理新的試卷生成。每個考試項目在題庫中都備注有詳細(xì)的內(nèi)容說明及統(tǒng)計分析信息,用以確保所生成的題目在涉及的內(nèi)容和難度水平上符合要求。
Standards Fixing Construction: 評分標(biāo)準(zhǔn)校正
Standards fixing ensures that there is a direct link between the standard of established and new versions before they are released for use at test centres around the world.
Different versions of the test all report results on the same underlying scale, but band scores do not always correspond to the same percentage of items correct on every test form. Before any test task is used to make important decisions, we must first establish how many correct answers on each Listening or Reading test equate to each of the nine IELTS bands. This ensures that band scores on each test indicate the same measure of ability.
解讀:新的現(xiàn)場考試題目生成并全球發(fā)布之前,評分標(biāo)準(zhǔn)校正環(huán)節(jié)的存在確保了新的考試與已經(jīng)進(jìn)行過的考試之間在評分標(biāo)準(zhǔn)上有著直接聯(lián)系。由于每次不同的考試的結(jié)果都同樣反映在雅思考試的9分制評分體系上,評分標(biāo)準(zhǔn)校正環(huán)節(jié)確保了每次考試雖然分?jǐn)?shù)段對應(yīng)的正確率不一定相同,但同樣的總分能夠反映出同樣的能力。
Live Test Construction and Grading: 生成現(xiàn)場測試
Live Test Release: 發(fā)布現(xiàn)場測試
At regular test construction meetings, Listening and Reading papers are constructed according to established principles. Factors taken into account are:
? the difficulty of complete test versions and the range of difficulty of individual items
? the balance of topic and genre
? the balance of gender and accent in the Listening versions
? the balance of item format (i.e. the relative number of multiple choice and other item-types across versions)
? the range of Listening/Reading skills tested.
The item banking software allows the test constructor to model various test construction scenarios in order to determine which tasks should be combined to create tests that meet the requirements.
Data are collected routinely from live administrations and analysed both to confirm the accuracy of the initial grading process and to support additional investigations into quality assurance issues.
解讀:現(xiàn)場考試題目根據(jù)既定的原則,在定期的試題生成會議上,雅思考試的題目根據(jù)難度、平衡度以及廣度等方面的5大原則最終被生成并發(fā)布,這其中題庫軟件根據(jù)場景的要求可以幫助試題生成人員確定特定題目的組合。
對于廣大考生而言,以上對于雅思考試官方試題開發(fā)流程的解密意味著什么?筆者認(rèn)為,意義有三。
Point of Interest One:
根據(jù)官方公布的試題開發(fā)流程我們可以發(fā)現(xiàn),由于雅思考試每年只進(jìn)行1到2次新試題素材的委托編選,則意味著每年只會有一次到兩次的試題庫更新的可能,而且還只是部分的。這告訴我們雅思考試的試題的確是相對穩(wěn)定的一個庫,而對于機(jī)經(jīng)的學(xué)習(xí)和掌握可以讓我們對于短則半年,長到一年的范圍內(nèi)的考題有相對的熟悉,從而幫助考生備考,提高考試成績。
Point of Interest Two:
同時,由于試卷開發(fā)流程中明確表明存在“評分標(biāo)準(zhǔn)校正”這一環(huán)節(jié),目前國內(nèi)考生中廣為流傳的雅思考試難度變化、評分標(biāo)準(zhǔn)變化、特定時間考試比其他考試時間更難/更易、某次考試特別難/易。。。等等說法,都屬“庸人自擾”。事實(shí)上,由于這一校正環(huán)節(jié)的存在,即便某次考試的確有著更高一點(diǎn)的難度,但評分標(biāo)準(zhǔn)確定的結(jié)果則是可以以較低的正確率得到同樣的最終分?jǐn)?shù),也就體現(xiàn)出來對能力測試的一致性。所以,朗閣海外考試研究中心建議廣大考生還是應(yīng)該關(guān)心自己應(yīng)該關(guān)注的事情,提高語言能力,熟悉考試方法,而不是胡亂猜忌,平添煩惱。
Point of Interest Three:
我們還可以注意到,在雅思考試的試題開發(fā)的過程中,篩選、編選合乎雅思要求,合乎A類G類題材、長度、難度、文體等等方面標(biāo)準(zhǔn)的內(nèi)容才會被選擇留下,再經(jīng)由嚴(yán)格的步驟最終完成一組試題的編寫。所以朗閣海外考試研究中心提醒廣大考生,一定不要在雅思備考過程中盲目選擇應(yīng)試材料,病急亂投醫(yī),這樣做的后果很可能是你花了時間付出了金錢卻走了彎路。忠告大家,一定要選擇有實(shí)力的專業(yè)的培訓(xùn)機(jī)構(gòu)根據(jù)雅思考試官方要求編寫的材料,正確備考。
相關(guān)咨詢請撥打400 666 1553(中國)0203 206 1211(
英國) 或發(fā)郵件到china@peinternational.co.uk(中國)enquiry@peinternational.co.uk (
英國)