SPEECH RECOGNITION AVAILABILITY

In the CLARIAH project we are carrying out automatic speech recognition (ASR) for the following items in the Netherlands Institute for Sound and Vision archive:

  • Radio 1 (Hilversum 2, Radio 1, NPO Radio 1)
  • Radio 5 (Hilversum 5, Radio 747, 747 AM, Radio 5, NPO Radio 5, NPO Radio 5 Nostalgia)
  • Source catalogs (items from the Radio Programma, Weken Nederlandse Radio, and Hoorspelen collections)
  • Television (news and current affairs)

This process is still ongoing. On this page you can see the availability of speech recognition data for the different types of items, both overall and per year. Finally, at the end you can see, as an example, what the speech recognition coverage is for the current affairs programme 'Eenvandaag'. If you have a specific request regarding coverage, we can produce this sort of information for you.

For a certain percentage of items, we have indicated 'ASR impossible'. This means that the material is not digitally available at present, and so cannot be put through the speech recogniser. This may change in the future as more material is digitised.

The graphs are updated regularly, the date of the last update is shown at the bottom of the website. Warning: This page is currently under development, so some graphs may change

Depending on your browser, the graphs on this page may take a few seconds to load


This pie chart shows you the availability of ASR for Radio 1 (items with 'network' Radio 1, NPO Radio 1 or Hilversum 2). The green segment shows items that have ASR transcripts available. The orange segment shows items that are waiting to be processed by the speech recogniser. The grey segment shows items for which ASR is currently impossible (no digital file available). Note that some of the material that is waiting has only been partially digitised, so only partial ASR will be possible.

This bar chart shows the availability of ASR for Radio 1 per year. The green bars show the material that has ASR, the orange bars show the material that does not have ASR yet, the grey bars show the material for which ASR is currently impossible (no digital file). Some older material is missing carrier or sort date information, so it could not be processed. Note that some of the material that is waiting has only been partially digitised, so only partial ASR will be possible.

This pie chart shows the availability of ASR for Radio 5 (items with 'network' Hilversum 5, Radio 5, Radio 747, 747 AM, Radio 5 Nostalgia, NPO Radio 5, NPO Radio 5 Nostalgia) material in the archive. The green segment shows items that have ASR transcripts available. The orange segment shows items that are waiting to be processed by the speech recogniser. The grey segment shows items for which ASR is currently impossible (no digital file available). Note that some of the material that is waiting has only been partially digitised, so only partial ASR will be possible.

This bar chart shows the availability of ASR for Radio 5 per year. The green bars show the material that has ASR, the orange bars show the material that does not have ASR yet, the grey bars show the material for which ASR is currently impossible (no digital version). The missing recent material is due to the large number of different names that Radio 5 has had - some of these names still need to be processed. Note that some of the material that is waiting has only been partially digitised, so only partial ASR will be possible.

This pie chart shows the availability of ASR for radio material from certain source catalogs (Radio Programma, Radio Programma (voorlopig), Weken Nederlandse Radio and Hoorspelen). The green segment shows items that have ASR transcripts available. The orange segment shows items that are waiting to be processed by the speech recogniser. The grey segment shows items for which ASR is currently impossible (no digital file available). Note that some of the material that is waiting has only been partially digitised, so only partial ASR will be possible.

This bar chart shows the availability of ASR for the source catalogs per year. The green bars show the material that has ASR, the orange bars show the material that does not have ASR yet, the grey bars show the material for which ASR is currently impossible (no digital version). The material is still being processed, starting from the most recent material, and working backwards, so that ASR is not yet available for older material. Note that some of the material that is waiting has only been partially digitised, so only partial ASR will be possible.

This pie chart shows the availability of ASR for television news and current affairs. The green segment shows items that have ASR transcripts available. The orange segment shows items that are waiting to be processed by the speech recogniser. The grey segment shows items for which ASR is currently impossible (no digital file available). Note that some of the material that is waiting has only been partially digitised, so only partial ASR will be possible.

This bar chart shows the availability of ASR for the TV news and current affairs per year. The green bars show the material that has ASR, the orange bars show the material that does not have ASR yet, the grey bars show the material for which ASR is currently impossible (no digital version). Note that some of the material that is waiting has only been partially digitised, so only partial ASR will be possible. The material is still being processed, starting from the most recent material, and working backwards, so that ASR is not yet available for older material. As is the nature of current affairs and news, new material is being added to the archive all the time, and some of this new material has not yet been submitted for processing, which accounts for a large part of the missing material in recent years. TV material is also more prone to errors in speech recognition than radio.

This pie chart shows the availability of ASR for the current affairs programme 'Eenvandaag'. If you want to know the availability for a specific programme, genre, network etc., then please ask.