Free Seminar for Chula Engineering Students (all majors)

Free Seminar for Chula Engineering Students (all majors)
Department of Industrial Engineering, Chula Engineering, and The University of Tokyo, Japan, present
“Text Mining by Using Python: Application to Patent Documents”
Instructor: Prof. Kazuyuki Motohashi (U-Tokyo, Japan)
Teaching Assitant: Dr. Suchit Pongnumkul
Date: May 27, 29, 31, 2019
Time: 09:00-12:00
Venue: Room 407, 4F, Engineering Building 4, Faculty of Engineering, Chulalongkorn University
Language: English (and Thai by TA)

– Introduction to patent data analysis
λ What is patent data? Why is it used for technology management research?
λ Various kinds of patent database: PATSTAT, JPO (IIP patent database), USPTO
λ Keyword extraction, TF-IDF, similarity measures
λ VIDEO (O’Reilly Media)
λ HOMEWORK: Apply patent abstract documents to the Video exercise above, and extract three keywords by using TF-IDF scores

– Patent Similarity
λ Review of homework
λ Preprocessing of text: tokenization regularization (lowering character), stemming/lemmatization and stop words exclusion
λ Text processing of tf-idf vectors:
① Create dictionary: mapping every word to a number
② Corpus (list of bags of words) : a list of number of words occurring in each documents
λ Calculation of similarity measures across each documents : genism.similarities

– Topic Modeling
λ What is topic modeling (with some examples)
① Understanding the concept: LDA
λ Gensim module in Python Topic modeling works for the following three technologies, comparison with JPO classification
① Artificial intelligence
② Autonomous driving
③ Gene modification technology
λ Good reference about topic modeling by genism
λ Assignment for further works (three weeks program)

Capacity: 20 seats
(Registration deadline: May 18, 2019)
Contact: Natt Leelawat, D.Eng. (