The word extract, from medieval Latin, derived in our idiom on extraction. This term refers to act and consequence of extracting: remove, extirpate, eliminate.
For instance: “The dentist told me that, two hours before the extraction of the tooth, I should take an antibiotic to avoid infections”, “The extraction of clams is prohibited since it is an animal in danger of extinction”, “Environmentalists assure that gold mining will destroy the mountain and cause irreversible damage to the ecosystem“.
We can find different kinds of extraction in multiple areas. When a person approaches a ATM (ATM), you can perform a extraction And take out money from your bank account, taking the bills that the machine gives you.
The blood draw, on the other hand, it is a procedure that is carried out in the field of nursing. When removing blood a patient, the sample can be analyzed and valuable information about the individual’s health can be obtained.
In the context of odontology, extraction is a surgery which consists of the removal of a tooth or part of it. In this framework, the dentist uses certain instruments and applies their knowledge and skills to achieve the objective.
Focusing on the computing, the information extraction It is an operation that is developed to retrieve content from a database. The process can be specified automatically if the information is structured.
The extraction of structured or semi-structured information is part of the tasks of Recovery, and is carried out using documents that can be read by the computer. For example, this process takes place when certain handwritten documents are scanned to interpret their data and bring it to a digital database; that is, there must be an application that recognizes the text and converts it to information that can be stored and edited, instead of simply leaving it in graphic format.
The form of the texts varies according to the project and the intentions of those who carry out the information extraction: in some cases, they are forms structured, which usually have been created by the company that tries to extract the information present in them after having been filled in by third parties; but it can also be unstructured texts, such as newspaper articles or fiction books.
Here comes the concept of natural language, which refers to a linguistic variety typical of the human being that is created with the objective of communicating and that is supported by a specific syntax and complies with the principles of optimization and economy of language. The text sources that can be used for the extraction of information must contain messages written in a language of this type.
Among the most common information extraction tasks are the following:
* name recognition: either the name of a person, a company or a place, or even monetary or other values expressions belonging to predefined categories, the information extraction is used to search and classify them;
* resolution of the coreference: it is about the detection of the correference between the entities of a given document, such as the one that exists between the full name of a company and its acronym;
* terminology extraction: in this case, the process consists of the analysis of a text to identify the arguments semantics that are linked to verbs, to establish a classification according to their roles. For example, in the sentence «Marisa bought a PDA from Valeria», “Marisa” is recognized as the buyer agent, the «PDA» is the object, «Bought» is the verb and «Valeria», the selling agent.
In the miningFinally, extraction is the exercise that allows to obtain minerals from a deposit and then exploit them commercially: copper extraction, lithium extraction, etc.