Over the course of the last two years, I have interviewed a lot of candidates for entry-level data science roles. I noticed a few completely avoidable mistakes being repeated frequently by many candidates.
Here are some of those mistakes and ways to avoid them.
Too often I see the following conversation:
“Can you tell me a little bit about XYZ project on your resume?”
“OK, so we start by converting PDF to images using pdf2image library and then run Tesseract OCR on it to convert the image into text. …
NER is typically approached as a sequence labelling problem and the models for it are usually evaluated through traditional classification metrics such as precision, recall, F-score, etc. You can read a primer on the topic in this post.
It is the process of identifying proper nouns from a piece of text and classifying them into appropriate categories. These categories can be generic like ‘Organization’, ‘Person’, ‘Location’, etc. or they can be tailor-made for a particular application, e.g. ‘Programming Language’, ‘Blogging site’, etc.
In simpler words, if you want to find out ‘who’, ‘what’, ‘when’, ‘where’ from a sentence, then NER is your task.
Take the following example (or try it yourself here):