Tomáš Koctúr: Data science is one of the most attractive areas of IT, it is not to be feared

<Enter article excerpt>

Tomáš is the Data Science ambassador of Deutsche Telekom IT Solutions Slovakia. He has been with our company since 2018 and from his ambassador position he wants to change people's attitude towards AI. Why shouldn't we be afraid of AI? You will find out in our interview.

How did you get into data science?

I've always been into IT in general, especially things that were incomprehensible or very complex to me. I studied telecommunications at TUKE and then did my PhD in speech recognition. I focused on the task of automatic preparation of training data for speech recognition system. I had to solve this task using neural networks, one of the machine learning methods I didn't know before. So I started studying neural networks. In order to really understand the topic, I studied it in depth and got interested. Today, AI is the most attractive part of IT for me, howvere one must constantly keep up with it, learn and follow trends.

What is actually the right name - artificial intelligence (AI) or machine learning?

Artificial intelligence is more of a marketing name for the area. Machine learning is a better term, as everything is based on what we teach the computer. First we design a suitable mathematical model and then, with the help of a large amount of data and a suitable algorithm, we "teach" the computer this model. People think of AI as a terminator and often fear for their jobs.

And how should they imagine AI?

As a very large mathematical formula in which the individual variables are not written by a human, but have been calculated in the process of learning the model using a large amount of good quality data. These techniques open up a lot of possibilities, allowing us to scale the work. We can teach a computer to decide things for us.

I am afraid of many things in life, but mathematical formulas, or machine learning models, are certainly not to be feared. The thing to remember is that a model is taught to behave in a certain way. It is created and tested by a human. A common myth is that models teach themselves as they are used - this is not true. In the same way, models have no inherent creativity, their results are determined by a mathematical formula. Current models don't even have memory, which makes them different from humans. Surely machine learning is nothing to be afraid of.

So the main benefit of machine learning is that it can scale work and make it easier?

Yes. Before, a human had to do a trivial task; today we train a model that can solve several thousand decisions in minimal time. And human can work on more complex and interesting tasks. We have simplified our work as there are tasks that we no longer have to do manually. I see room for improvement in almost all tasks that work with input data such as numbers, text, sound, image, video, or combinations of these.

We have our own Data science, can you tell us more about it?

Our team started about 3 years ago, it currently includes about 8 data scientists, plus some other assissting positions. We have room to grow but even in this number we are probably the biggest Data science team in the region.

We are mainly working in the field of Natural Language Processing (NLP). Its specific due to the fact that machine learning models do not involve numbers, but unstructured text. For example, we work on various chatbots that communicate with end users using words. Therefore, we focus on automatic speech recognition, text-to-speech synthesis, natural language processing, and understanding unstructured text and anomaly detection in both text and metric data. This development is carried out within the international team of our concern. In addition, we are also working in the field of computer vision, where we are trying to simplify the processes of our accounting department.

What is the most difficult part of being a data scientist?

Our fuel is data and every data scientist complains about 2 things: low computing power and lack of data. For us, data are worth their weight in gold, we need to not only have them, but we need to prepare, process, normalize and clean them so that we can train a model with it. If it is a simple model, it can be trained in 5 minutes. If we are talking about models for e.g. automatic speech recognition, computer vision or NLP, it takes days or even weeks. The more complex the model, the more complex task it is. Going back to the fact that machine learning models are actually mathematical equations with lots of variables, the most complex models have up to a billion variables that have to be learned in the training process, so it often takes an awfully long time.

We know you are also involved in AI ethics issues, how specifically?

I am trying to engage in this area as a member of the Commission for Ethics and Regulation of Artificial Intelligence, which was established by the Ministry of Investment, Regional Development and Informatization of the Slovak Republic. Development is unstoppable and the problem of ethicality has been and will continue to be there. It has already occurred, for example, in several countries, so it is essential to learn from the mistakes of others and to help by setting up regulations for this sector. So that the ethical aspect is preserved, but also so that the development of AI is not unnecessarily blocked. Our Commission assists the Ministry in commenting on these topics, whether within the EU, UNESCO or the OECD, as these regulations are mostly made at supranational level.

How do you see the future of AI?

The future of AI is inevitable, but I am not one to exaggerate. For example, autonomous cars can reduce the accident rate over the next few years. Based on mileage and accident statistics, the autopilot in Tesla is more reliable than humans, even if it doesn't work 100%. However, people fear autopilot because they may be that one thousandth of a percent. It's similar to the fear of flying although it's statistically the safest means of transportation. In the future, I expect machine learning models to be deployed everywhere until something better is invented to replace them.

 

What do you think a person needs to work in data science?

Abstract thinking, creativity, willingness to explore things in depth and to constantly learn new things. It is sometimes said that data science is more about art than science. We often don't know ahead of time what the right solution is until we try it. Abstract thinking is necessary for the aforementioned model architectures with hundreds of thousands to billions of variables. With that many, it's impossible to draft everything, you have to put things together in your head. Often we need to think of the data as matrices that go into the models we design to make it right and make it behave the way we want it to behave.

Then it's also the desire to constantly learn new things. Because the times are not standing still and this area is always growing and evolving. What was true today may not be true at all a month from now. An essential part of a data scientist's skillset is working with data. Not only processing and modifying them into a suitable form, but often also collecting them from the Internet as the customer does not have data or does not have them in sufficient quantity, the data scientist must be creative enough in this regard.