Teaching
Courses
Currently teaching:
- High Performance Computing course at the Dept. of Environmental Sciences, Informatics and Statistics of the Università Ca’ Foscari di Venezia. Topics: distributed- multicore- GPU- programming. Material is available through the moodle. (AA. 2018/19)
- Web Intelligence course at the Dept. of Environmental Sciences, Informatics and Statistics of the Università Ca’ Foscari di Venezia. Topics: supervised and un-supervised methods for data and web mining. Material is available through the moodle. (AA. 2018/19)
- Introduction to Coding and Data Management course at the Dept. of Management of the Università Ca’ Foscari di Venezia. Topics: Python Programming, Problem Solving, Computational Thinking, Data Management. Material is available through the moodle. (AA. 2018/19)
- Highlights in Web Search and Data Mining Ph.D. course “Computer Science” at the Dept. of Environmental Sciences, Informatics and Statistics of the Università Ca’ Foscari di Venezia. Material is available here: https://bitbucket.org/wsdmcourse2018/. (AA. 2018/19)
Topics for a Master Thesis
- Interpretability of Machine-Learning models. Effective machine-learning models (either forests of decision trees or deep artificial neural networks) are extremely complex encompassing with several thousands of parameters. The goal of this work is to understand which functions are actually learned and which kind of knowledge is extracted by those models or their sub-components.
- Learning-to-Rank State of the art. Comparison of most recent algorithms and models for ranking tasks, e.g., ranking web documents in a search engine. Some methods are implemented by existing software, e.g., by microsoft, while some methods could be easily implemented as additional plugins.
- Adversarial ML. Despite recent advances in building effective machine-learning models, these can be easily fooled by adversarially crafted instances, e.g., slight modifications to a photo may force a face recognition software to wrongly identify the subject. We are interested in investigating adversarial attacks in different contexts and scenarios.
- Optimal Decision Trees. The goal of this thesis is to design a novel algorithm for decision tree growing which, in contrast to greedy approaches, aims at building the best possible decision tree.
- BDT: Gradient Boosted Decision Tables for High Accuracy and Scoring Efficiency. BDT is one of the commonly used algorithms for learning ranking functions. For instance, such methods are used to learn document ranking functions in Web search engines. The goal of the thesis is to improve the quality of the learnt models and to improve the efficiency of the training (e.g., via parallelization).
- Fast Traversal of Forests of decision trees. Forests of decision trees are one of the most effective machine learning methods, e.g., winning several of the Kaggle contests. Those forests may include thousands of large trees, and therefore their exploitation is expensive, especially in production environment. The goal of this thesis is to design fast and efficient algorithms are required to traverse forests of decision trees.
- Community Detection on the GPU. Community detection is one of the most common tasks in (social) network analysis. Mining large datasets is very challenging and computationally expensive. The goal of the thesis is to design a novel algorithm for mining communities in very large graphs (e.g., exploiting GPUs).
- Machine-Learning-as-a-Service. Several cloud platform provide easy access to machine learning tools. We are interested in understanding the potentiality of such tools and to investigate whether the properties of the generated models can be systematically verified.
Contact me if you would like discuss about theses topics focusing on Data Mining and Information Retrieval.
Offerte di Lavoro, Stage, Internships
- Internship e offerte presso Bloomberg London.
- 2019 Software Engineer - Summer Internship: https://careers.bloomberg.com/job/detail/70146
- 2019 Software Engineer - Graduate/Entry Level: https://careers.bloomberg.com/job/detail/70145
- 2019 Software Engineer - Industrial Placement: https://careers.bloomberg.com/job/detail/70148
- Offerta di Lavoro
- Società di Padova attiva in ambito telecomunicazioni e server cerca
un programmatore full stack scopo assunzione. Si ricerca una persona
che abbia dimestichezza con i sistemi GNU/Linux, unico ambiente di
lavoro presente e che abbia principalmente una buona conoscenza di
Python. E’ richiesta anche la conoscenza della lingua inglese.
Altri linguaggi o framework utilizzati sono principalmente GoLang, Django, Vue JS, Ansible. Sono gradite conoscenze delle tecnologie web in generale HTTP, REST, AJAX del protocollo SIP (RFC 3261). Software quali Asterisk, Kamailio, Freeswitch. Database come MySQL, PostgreSQL, Redis. Strumenti quali Git e software di CI. Ambienti di virtualizzazione KVM, Docker, Kubernetes.
Si offre la possibilità di lavorare con una certa autonomia, di sviluppare nuovi progetti e di frequentare corsi.
- Società di Padova attiva in ambito telecomunicazioni e server cerca
un programmatore full stack scopo assunzione. Si ricerca una persona
che abbia dimestichezza con i sistemi GNU/Linux, unico ambiente di
lavoro presente e che abbia principalmente una buona conoscenza di
Python. E’ richiesta anche la conoscenza della lingua inglese.
- Stage/Tesi in azienda Florence Technologies s.r.l.
- Progetto in ambito IOT e dispositivi in Edge computing. In collaborazione con l’Università di Siena, l’azienda SECO e diverse altre aziende, andremo ad implementare una serie di librerie per l’analisi e la predizione di dati da poter utilizzare direttamente sul gateway. I modelli riceveranno dati dai vari sensori connessi al gateway, l’idea è di avere uno step di training eseguito su una macchina remota, eseguire il deploy del modello sul gateway e eseguire qui l’inferenza, considerando quindi non solo accuratezza delle predizioni ma anche performance e memoria. Le casistiche di uso potranno spaziare da predizione di guasti a task di auto gestione del sistema.
- Machine Learning per l’analisi del testo o traduzioni. L’azienda possiede sistema di gestione delle richieste mediche, attivo in tutta europa per conto di una casa farmaceutica. Stiamo sperimentando sia la possibilità di avere la traduzione delle richieste, sia la classificazione automatica delle richieste in arrivo per categoria / argomento. Per quando riguarda la traduzione l’idea sarebbe quella di addestrare un modello utilizzando dataset (generici) esistenti e poi provare a specializzarlo con transfer learning per ampliare la conoscenza del modello su contesti medici. Sarebbe anche interessante sperimentare e paragonare le performance con modelli unici multi lingua.