In a context where most of our data is online, information comes from a great diversity of actors, and is now stored in a great variety of formats. This ranges from highly structured formats such as traditional databases to simple texts, and includes many intermediary formats - such as XML, HTML, JSON or HTML for instance - usually grouped under the term semi-structured data. This diversity in format, but also the wide range of tools that are needed to manipulate those data, coupled with the fact that their structuration is often underspecied is a problem for the end-user as it can be hard for a non-expert to simply extract the data he truly needs. Machine learning can provide solutions to automatically design tools that helps the user to query or transform semi-structured data. In this presentation, we will investigate how technics issued from grammatical inference can be adapted in this framework.
defended on 16/11/2018
In a context where most of our data is online, information comes from a great diversity of actors, and is now stored in a great variety of formats. This ranges from highly structured formats such as traditional databases to simple texts, and includes many intermediary formats - such as XML, HTML, JSON or HTML for instance - usually grouped under the term semi-structured data. This diversity in format, but also the wide range of tools that are needed to manipulate those data, coupled with the fact that their structuration is often underspecied is a problem for the end-user as it can be hard for a non-expert to simply extract the data he truly needs. Machine learning can provide solutions to automatically design tools that helps the user to query or transform semi-structured data. In this presentation, we will investigate how technics issued from grammatical inference can be adapted in this framework.
defended on 16/11/2018