Question 21/XV - 16 kbit/s speech signal encoding and extension to other bandwidths and bit rates (Continuation of Question 27/XVIII 1985-88) Considering (1) that a number of sophisticated coding algorithms for speech can provide reasonable speech quality at a bit rate of 16 kbit/s; (2) that the rapid progress of LSI circuit technologies makes it possible to substantially reduce the cost and size of codec hardware to perform such sophisticated processing; (3) that it may be possible to extend the coding to provide wideband speech at 32 kbit/s; (4) that it is envisaged that several areas need low bit rate speech to reduce transmission cost, radio frequency bandwidth for mobile radio, storage capacity for store-and-forward messaging and to provide economic wideband speech transmission for audiovisual services, etc., 1. What encoding algorithm(s) should be recommended for speech coding at a bit rate of 16 kbit/s? 2. What should be recommended to allow extension to other bandwidths and bit rates? 3. What should be recommended for transcoding between 16 kbit/s encoding and 64 kbit/s PCM in accordance with Recommendation G.711? 4. What should be recommended for transcoding between 16 kbit/s encoding and 32 kbit/s ADPCM in accordance with Recommendation G.721? 5. What testing conditions and testing methods should be recommended to verify the relevant performance of equipment that incorporates the recommended algorithm(s)? Points for study 1.The applications and performance requirements that should be considered. 2.Extension to provide wideband speech at 32 kbit/s. 3.Whether it is possible to realize tandeming without any serious quality degradations. 4.The testing conditions and testing methods that should be used to select the encoding algorithm(s) to be recommended. 5.Whether the audio frequency characteristics should be specified. 6.Whether there is a need to take into account voice band signals other than speech, e.g., voice band data, facsimile, etc., for the definition of the algorithm(s) and, if so, the performance requirements for these signals. 7.Network constraints and considerations, e.g. delay and interaction with echo control. Notes 1. Account should be taken of the work in Study Group XII on speech performance aspects of digital systems and integration of mobile systems into the public switched networks, and subjective evaluation. 2. Account should be taken of the work in Study Group XVIII on ISDN and general network issues. 3. Account should be taken of the work in CCIR Study Group 8 on Programme 40A/8. 4. Account should be taken of the studies carried out under Questions 3, 4 and 10/XV. ANNEX (to Question 21/XV) Terms of reference of the ad hoc Group on 16 kbit/s speech coding Study Group XVIII agreed to set up an ad hoc Group on 16 kbit/s speech coding with the following terms of reference: 1. To select a unique coding algorithm for the achievement of a CCITT standard by the mid of next study period (mid 1991) in order to submit the draft Recommendation on this standard to the accelerated procedure. The selected algorithm should have universal application and be implemented by service and network providers with minimum integration difficulty. 2. The performance requirements and objectives for such unique coding algorithm are given in Appendix 1. 3. The assessment criteria and selection rules for the candidate coding algorithms will be as follows: a) candidate algorithms must meet the requirements; b) all algorithms meeting the requirements will then be ranked according to two main criteria: i) the margins on the various requirement parameters; ii) the capability to meet the performance objectives. The detailed working criteria have to be defined before the competitive phase. 4. The ad hoc Group on 16 kbit/s speech coding should be closely coordinated with the ad hoc Group on speech quality set up by Study Group XII, as far as quality objectives and test methodology are concerned. 5. The tentative work plan of the ad hoc Group is given in Appendix 2. According to this plan the Group should report on the progress reached at the first meeting of Study Group XV (S N) likely in April/May 1989. In particular, the Group should advise Study Group XV on the concrete possibility to reach the unique CCITT standard for universal applications of 16 kbit/s speech coding. In case this goal is reachable, Study Group XV should confirm the remaining work plan. In case such a goal appears to be not feasible, Study Group XV should consider whether the standardization of a limited number of application-dependent solutions is feasible and define new goals and work plans. 6. Network aspects should be reported to Study Group XVIII at its first meeting in 1989. 7. A list of possible network applications for 16 kbit/s speech coding is given in Appendix 3. It is agreed that an algorithm, that meets the requirements and reaches to a large extent the objectives set up in Appendix 1 could meet reasonably well all applications listed in Appendix 3, even though it might not be optimal for a certain application. A possible clarification of more significant applications listed in Appendix 3 regarding specific performance objectives and requirements is the following: i) videophone and store and forward applications; ii) digital mobile radio, cordless telephones and low C/N satellite system applications; iii) DCME, PSTN and ISDN applications. Mr. R. Pietroiusti (SIP, Italy) was appointed Chairman of the ad hoc Group on 16 kbit/s speech coding. Appendix 1 (to Annex to Question 21/XV) Performance requirements and objectives for the CCITT standard on 16 kbit/s speech coding algorithm TABLE 1 + - + PARAMETER REQUIREMENTS OBJECTIVES + - + + Speech quality in qdu's1 For an input signal - nominal level of -22 dB with respect to the overload point: - BER < 10-6 < 4.02 random errors - BER < 10-3 Not worse than that random errors of G.721 under similar conditions3 4 - BER < 10-2 Not worse than that random errors of G.721 under similar conditions3 + - + + Speech quality dependency To be defined. (It As low as possible on the input signal level will be provided between -32 dB and -12 dB by SG XII) with respect to the over- load point + - + + One way coder/decoder delay in ms5 < 5 < 2 + - + TABLE 1 (Cont.) + - + PARAMETER REQUIREMENTS OBJECTIVES + - + + Encoder/decoder To be defined - synchronization6 Capability to transmit - Data at bit rates as high as voice-band data possible with satisfactory BER Capability to transmit7 DTMF, CCITT No. 5 The tones have to be trans- signalling/information CCITT No. 6 (circuit mitted with distortion as low tones continuity tone) as possible CCITT No. 7 (circuit continuity tone) CCITT R28 Q.35, Q.23, V.25 Capability to transmit - No annoying effects have to music9 be generated Gross bit rate, 16 - kbit/s10 11 Tandemming capability 3 asynchronous with Synchronous tandemming for the speech a total distortion property < 14 qdu Tandemming capability - 3 asynchronous for the voice-band data Effects of the switching for further study12 transients following data discrimination Convergence time - < 10ms13 Capability to operate - Graceful speech quality at different bit rates degradation when operating at lower bit rates Complexity14 - As low as possible + - + Notes to Table 1 1. The numbers of qdu's are averaged values for a number of subjective tests and conditions. The average procedure through appropriate weightings of the tests and conditions will be provided by Study Group XII. 2. The requirements and the objectives of the qdu refer to the distortion introduced between the input PCM (A/5 law) interface of the coder and the output PCM (A/5 law) interface of the decoder. So, the distortion allowance for the algorithm is 3 qdu. 3. The actual figures of the distortions in these conditions will be spelt out in terms of qdu's in the course of the subjective tests. 4. In the definition of the test conditions, it will be necessary to take into account that this BER condition has not to be represented as a permanent normal condition; rather it has to be interpreted as a condition that, in probabilistic terms, does not occur in e.g. 95% of the time. 5. The delay values refer to the delay introduced by the algorithm between the input and output uniform PCM interfaces. To evaluate the overall codec delay, the expected delay due to the PCM coding, implementation processing and serial transmission at 16 kbit/s has to be taken into account and it will be an important item in the codec selection process. 6. It may be assumed that an external 8 kHz timing signal will be available (e.g. from the 64 kbit/s octet structure). This timing signal may be used to directly develop a 2 bit frame at 16 kbit/s which may be compatible with some algorithms. If the algorithm requires a frame length greater than 2 bit, an internal frame is required within the 16 kbit/s gross bit rates. The performance of the frame alignment strategy of such an internal frame will require specification. It should also be noted that the 16 kbit/s gross bit rate will be exposed to controlled octet slips when carried in 64 kbit/s channels, and the internal frame alignment strategy should allow for this. 7. The actual distortion requirements and objectives for the tones to be transmitted are for further study. 8. It should be noted that CCITT R2 Signalling System requires also the transmission of outband tones (e.g. 3 825 Hz). Methods to transmit these outband tones are for further consideration. 9. Music (both from the radio and electronically generated) is often used in PABX systems when calls are transferred or put on hold. 10. The gross bit rate is inclusive of any overheads for algorithmic functions (e.g. synchronization and possible error correction to meet the speech quality requirements/objectives). It does not include any further overheads for transmission channel dependent functions (e.g. radio channel coding). 11. The transmission delay due to the possible use of error correcting codes for algorithmic purposes (see Note 10) would be included in the limits specified for the one-way coder/decoder delay. 12. The effects of the switching transients following the data discrimination will likely be perceived as an impulse noise (clicks) rather than a qdu degradation. They depend, among other things, on the coder/decoder delay and on the covergence time of the algorithm. 13. A short convergence time is necessary to avoid speech clipping. 14. Early evaluation of the complexity will be performed on the basis of the detailed level description of the algorithms (e.g. number of multiplications, number of shifts, etc.). Appendix 2 (to Annex to Question 21/XV) Tentative work plan of the ad hoc Group activities Meeting Activities time - 5-12 December '88 - Agreement on methodology to classify codec (Florida, USA) performance: assessment, routing and selection criteria; -definition of test conditions for the competitive phase taking into account the input from the Study Group XII ad hoc Group; -preliminary declaration of intention to submit candidate algorithms; -discussion on the need to limit the number of candidates; -discussion on patent aspects; -organization of the laboratory sessions for the competitive phase; -specification of hardware models; -definition of host-laboratory facilities; -definition of the way to describe the algorithms and to evaluate their complexity. April '89 -Submission of high-level description of candidate algorithms; -submission of test results for the candidate algorithms by proponents; -possible reduction of the number of candidates on the basis of the capability to meet performance requirements; -report to Study Group XV (Study Group N) first meeting on the possibility of reaching a unique CCITT standard for universal applications of 16 kbit/s speech encoding. Spring '89 First meeting of Study Group XV (Study Group N). At this meeting Study Group XV should either confirm the following plan of the ad hoc Group or define alternative goals and plans. October/November '89 -Submission of detailed description of candidate algorithms; -laboratory session for the first round of tests. April '90 -Comparison of test results; -evaluation of algorithm complexity; -selection of 1 (2) candidate(s) for further consideration (the selected candidate at this stage could not necessarily be identical to one of the source candidates due to possible compromise mixture); -definition of testing methodology for the final round of tests. October '90 -Agreement on the selected algorithm; -organization of the final round of tests. November '90 -Laboratory session for the final round of tests. March '91 -Evaluation of the results of the final tests; -preparation of the draft Recommendation; -preparation of the report to Study Group XV. April/May '91 -Submission of draft Recommendation to Study Group XV. Appendix 3 (to Annex to Question 21/XV) List of possible applications The following applications were agreed for 16 kbit/s voice coding: 1. videophone service using transfer rates 64, 2 x 64, 128 kbit/s; 2. cordless telephone; 3. low C/N digital satellite systems. This includes maritime, thin- route and single channel per carrier satellite systems; 4. DCME. In this equipment 16 kbit/s speech coding is generally combined with DSI techniques. The equipment may be used for long terrestrial connections and for digital satellite links generally characterized by high C/N ratios; 5. PSTN. This application covers the encoding of voice telephone channels in trunks, junction or distribution networks (e.g. transcoder equipment); 6. ISDN. Distinguishing features with respect to PSTN for setting 16 kbit/s speech coding requirements are the availability of unrestricted end-to-end digital bearer capabilities, absence of electrical echo control devices and availability of SS No. 7; 7. digital leased lines. Two possibilities may be envisaged in this case; one is where the end-to-end digital leased circuits include only one encoding/decoding, the other is where the end-to-end digital leased circuits are connected into the public network and they may include digital transcodings; 8. store and forward systems; 9. voice messages for recorded announcements; 10. land Digital Mobile Radio (DMR) systems; 11. packetized speech; 12. audio channel for low bit rate one-way video service (e.g. surveillance).