1 00:00:00,000 --> 00:00:02,290 In this video, we will discuss 2 00:00:02,290 --> 00:00:04,470 Application Program Interfaces that 3 00:00:04,470 --> 00:00:06,970 use some kind of artificial intelligence. 4 00:00:06,970 --> 00:00:09,510 We will transcribe an audio file using 5 00:00:09,510 --> 00:00:11,890 the Watson Text to Speech API. 6 00:00:11,890 --> 00:00:13,710 We will then translate the text to 7 00:00:13,710 --> 00:00:17,520 a new language using the Watson Language Translator API. 8 00:00:17,520 --> 00:00:19,360 In the API call, 9 00:00:19,360 --> 00:00:22,650 you will send a copy of the audio file to the API. 10 00:00:22,650 --> 00:00:25,440 This is sometimes called a POST request. 11 00:00:25,440 --> 00:00:27,210 Then the API will send 12 00:00:27,210 --> 00:00:30,180 the text transcription of what the individual is saying. 13 00:00:30,180 --> 00:00:34,050 Under the hood, the API is making a GET request. 14 00:00:34,050 --> 00:00:36,660 We then send the text we would like to translate 15 00:00:36,660 --> 00:00:39,520 into a second language to a second API. 16 00:00:39,520 --> 00:00:42,040 The API will translate the text 17 00:00:42,040 --> 00:00:44,430 and send the translation back to you. 18 00:00:44,430 --> 00:00:47,610 In this case, we translate English to Spanish. 19 00:00:47,610 --> 00:00:51,950 We then provide an overview of API keys and endpoints, 20 00:00:51,950 --> 00:00:55,740 Watson Speech to Text, and Watson Translate. 21 00:00:55,740 --> 00:00:59,540 First, we will review API keys and endpoints. 22 00:00:59,540 --> 00:01:02,400 They will give you access to the API. 23 00:01:02,400 --> 00:01:06,120 An API key as a way to access the API. 24 00:01:06,120 --> 00:01:08,510 It's a unique set of characters that the API 25 00:01:08,510 --> 00:01:11,440 uses to identify you and authorize you. 26 00:01:11,440 --> 00:01:15,450 Usually, your first call to the API includes the API key. 27 00:01:15,450 --> 00:01:18,380 This will allow you access to the API. 28 00:01:18,380 --> 00:01:21,800 In many APIs, you may get charged for each call. 29 00:01:21,800 --> 00:01:23,060 So like your password, 30 00:01:23,060 --> 00:01:25,330 you should keep your API key a secret. 31 00:01:25,330 --> 00:01:28,740 An endpoint is simply the location of the service. 32 00:01:28,740 --> 00:01:30,650 It's used to find the API on 33 00:01:30,650 --> 00:01:33,110 the Internet just like a web address. 34 00:01:33,110 --> 00:01:35,690 Now, we will transcribe an audio file 35 00:01:35,690 --> 00:01:38,530 using the Watson Text to Speech API. 36 00:01:38,530 --> 00:01:40,340 Before you start the lab, 37 00:01:40,340 --> 00:01:42,830 you should sign up for an API key. 38 00:01:42,830 --> 00:01:46,080 We will download an audio file into your directory. 39 00:01:46,080 --> 00:01:50,450 First, we import SpeechToTextV1 from IBM Watson. 40 00:01:50,450 --> 00:01:52,250 The service endpoint is based on 41 00:01:52,250 --> 00:01:54,290 the location of the service instance. 42 00:01:54,290 --> 00:01:58,410 We store the information in the variable url_s2t. 43 00:01:58,600 --> 00:02:01,700 To find out which URL to use, 44 00:02:01,700 --> 00:02:03,550 view the service credentials. 45 00:02:03,550 --> 00:02:06,090 You will do the same for your API key. 46 00:02:06,090 --> 00:02:09,080 You create a speech-to-text adapter object. 47 00:02:09,080 --> 00:02:12,420 The parameters are the endpoint and API key. 48 00:02:12,420 --> 00:02:14,510 You will use this object to communicate 49 00:02:14,510 --> 00:02:16,830 with the Watson Speech to Text service. 50 00:02:16,830 --> 00:02:19,010 We have the path of the wav file 51 00:02:19,010 --> 00:02:20,780 we would like to convert to text. 52 00:02:20,780 --> 00:02:22,920 We create the file object wav 53 00:02:22,920 --> 00:02:25,010 with the wav file using open. 54 00:02:25,010 --> 00:02:27,020 We set the mode to rb, 55 00:02:27,020 --> 00:02:29,870 which means to read the file in binary format. 56 00:02:29,870 --> 00:02:31,700 The file object allows us 57 00:02:31,700 --> 00:02:34,280 access to the wav file that contains the audio. 58 00:02:34,280 --> 00:02:36,290 We use the method recognize from 59 00:02:36,290 --> 00:02:38,610 the speech to text adapter object. 60 00:02:38,610 --> 00:02:40,490 This basically sends the audio file 61 00:02:40,490 --> 00:02:42,660 to Watson Speech to Text service. 62 00:02:42,660 --> 00:02:45,960 The parameter audio is the file object. 63 00:02:45,960 --> 00:02:49,280 The content type is the audio file format. 64 00:02:49,280 --> 00:02:51,080 The service sends a response 65 00:02:51,080 --> 00:02:53,160 stored in the object response. 66 00:02:53,160 --> 00:02:56,640 The attribute result contains a python dictionary. 67 00:02:56,640 --> 00:02:58,460 The key results value has 68 00:02:58,460 --> 00:03:00,410 a list that contains a dictionary. 69 00:03:00,410 --> 00:03:03,350 We are interested in the key transcript. 70 00:03:03,350 --> 00:03:04,430 We can assign it to 71 00:03:04,430 --> 00:03:08,180 the variable recognized_text as follows. 72 00:03:08,180 --> 00:03:11,450 Recognized_text now contains a string 73 00:03:11,450 --> 00:03:13,280 with a transcribed text. 74 00:03:13,280 --> 00:03:15,020 Now let's see how to translate 75 00:03:15,020 --> 00:03:17,600 the text using the Watson Language Translator. 76 00:03:17,600 --> 00:03:23,420 First, we import LanguageTranslatorV3 from ibm_watson. 77 00:03:23,420 --> 00:03:27,470 We assign the service endpoint to the variable url_l2. 78 00:03:27,470 --> 00:03:30,450 You can obtain the service in the lab instructions. 79 00:03:30,450 --> 00:03:32,960 You require an API key, 80 00:03:32,960 --> 00:03:36,800 see the lab instructions on how to obtain the API key. 81 00:03:36,800 --> 00:03:38,870 This API request requires 82 00:03:38,870 --> 00:03:41,870 the date of the version, see the documentation. 83 00:03:41,870 --> 00:03:44,560 We create a language translator object, 84 00:03:44,560 --> 00:03:46,270 LanguageTranslator. 85 00:03:46,270 --> 00:03:48,350 We can get a list of the languages that 86 00:03:48,350 --> 00:03:50,570 the service can identify as follows. 87 00:03:50,570 --> 00:03:53,010 The method returns the language code. 88 00:03:53,010 --> 00:03:57,990 For example, English has a symbol en to Spanish, 89 00:03:57,990 --> 00:04:00,310 which has the symbol es. 90 00:04:00,310 --> 00:04:02,270 In the last section, we assigned 91 00:04:02,270 --> 00:04:05,600 the transcribed texts to the variable to recognized_text. 92 00:04:05,600 --> 00:04:08,460 We can use the method translate. 93 00:04:08,460 --> 00:04:10,100 This will translate the text. 94 00:04:10,100 --> 00:04:13,130 The result is a detailed response object. 95 00:04:13,130 --> 00:04:15,980 The parameter text is the text. 96 00:04:15,980 --> 00:04:20,240 model_id is the type of model we would like to use. 97 00:04:20,240 --> 00:04:26,190 In this case, we set it to en-es for English to Spanish. 98 00:04:26,190 --> 00:04:28,460 We use the method get result to get 99 00:04:28,460 --> 00:04:29,810 the translated text and 100 00:04:29,810 --> 00:04:32,150 assign it to the variable translation. 101 00:04:32,150 --> 00:04:34,370 The result is a dictionary that includes 102 00:04:34,370 --> 00:04:37,680 the translation word count and character count. 103 00:04:37,680 --> 00:04:40,160 We can obtain the translation and assign it to 104 00:04:40,160 --> 00:04:44,110 the variable spanish_translation as follows. 105 00:04:44,110 --> 00:04:47,570 Using the variable spanish_translation, 106 00:04:47,570 --> 00:04:50,940 we can translate the text back to English as follows. 107 00:04:50,940 --> 00:04:53,000 The result is a dictionary. 108 00:04:53,000 --> 00:04:56,160 We can obtain the string with the text as follows. 109 00:04:56,160 --> 00:04:59,460 We can then translate the text to French as follows. 110 00:04:59,460 --> 00:05:02,330 Thanks for watching this video. 111 00:05:02,330 --> 00:05:06,000 (Music)