BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//8WTDGX
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-spathum24-8WTDGX@pretalx.com
DTSTART;TZID=CET:20240927T113000
DTEND;TZID=CET:20240927T120000
DESCRIPTION:In the past decade\, there has been a growth of interest in exp
 loring the large amount of data present on social media\, generated by use
 rs around the world. Some calculations aim to confirm that are around 4.95
  billion of users of social networks\, and they are generating huge quanti
 ties of information that can be used for research in geography and spatial
  humanities. Multiple investigations about data contained in social networ
 ks as Twitter \, Flickr\, Reddit\, TripAdvisor and other popular social ne
 tworks could be found in general academic research\, especially with in su
 bjects like Natural Language Processing (NLP) that also has been growing e
 xponentially in the past few years. With such a big source of information\
 , there are a lot of work possibilities in different topics in which geogr
 aphy and spatial humanities must not be unconnected\, because as the avail
 able information is generated by people\, we can found a lot of different 
 topics to study this kind of information. \n\nTrying to understand geograp
 hic data from social media information is one important goal for researche
 rs in recent years. But how can we obtain that geographic information from
  sources that are principally texts and pictures? In this research we try 
 to answer that question\, in that context\, the main goal of this research
  is to make social media data a source from geographic information that co
 uld be used for several researching and decision making. \nThere has been 
 some approaches to the main question by using social media data\, for exam
 ple\, some researches tried to use geotagged pictures to find some spatial
  patterns of sentiments with photos from Instagram and Flickr. Another app
 roach is using Named Entity Recognition (NER) to process TripAdvisor comme
 nts reviews using the text contents. In the case of twitter data\, there h
 as been 3 principal approaches to the matter: 1) Use the metadata of the i
 nformation (as they call geo-tagged tweets)\; 2) Inferring the geographic 
 location of the tweet using a combination of metadata\, profile data and m
 aking predictions based on the language of the texts available in the cont
 ent being able to summarize a location of the origin of the tweet\, and fi
 nally\, 3) one of the most common approach by using techniques as NER.\n\n
 Except for a few cases of work with data from Indonesia\, China and India 
 and focused to the local languages\, most of the work in this task has bee
 n in the English language or has used another approach like taking the wor
 ds from the original language and translating it to English with automated
  translation methods. Is in this context than a necessity of working with 
 models that can be trained to use NER approaches in Spanish language speci
 fically for Spanish in Colombia has reached\, and to make the testing task
  with twitter data of Spanish tweets of Colombia could be useful to contri
 bute growing the NER tasks focused on identifying location in short texts 
 as tweets. Furthermore\, NER tasks are too general to named entities\, so 
 they are useful to find names\, location\, roles and organization\, in thi
 s case\, the main focus of this process is to use it focused in Locations.
 \n\nTo achieve that goal\, the exploration of NER methods has been taking 
 place by exploring some supervised trained models for this task\, first\, 
 testing some of the available as Stanford NER and Spacy library NER and co
 mparing it with the results of a trained supervised NER model using Colomb
 ian Spanish and Colombian toponyms. In this way we can see the improvement
 s of the NER tasks in the recognition of locations for this specific case.
  By comparing the methodological approaches\, and by generating the corres
 ponding models we could say this approach of a Colombian Language NER is a
  big contribution in several fields: 1) the researching in NER tasks of th
 e scientific community interested in NLP process and 2) the spatial humani
 ties\, geography community  and institutions that can take another huge ge
 ographic information resource to further researching and decision making t
 owards the geo-spatial understanding on the world.\n\nAs this work is part
  of a bigger effort to understand the geographical space in Colombia with 
 the use of data presented in texts (short texts in the case of twitter) pr
 ocessed with NLP\, testing NER tasks with Colombian Spanish to extract geo
 graphic locations is one of the first steps of the work\, so that is why t
 he future work will be related to use different approaches of unsupervised
  training as topic modeling and finally\, trying to summarize that extract
 ion with some topic and sentiment analysis in the tweets\, all of this in 
 an effort to contribute to the spatial humanities and digital humanities a
 pproaches.
DTSTAMP:20241016T141805Z
LOCATION:MG1/02.05
SUMMARY:Extracting Geographic Information from Social Media Data\, an appro
 ach using NER with Colombian spanish - Brayan Oviedo
URL:https://pretalx.com/spathum24/talk/8WTDGX/
END:VEVENT
END:VCALENDAR