SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
ICWSM’11 Tutorial
Exploratory Network Analysis with:




             Instructors: Sébastien Heymann, Julian Bilcke
                  seb@gephi.org, julian.bilcke@gephi.org

                     July 17, 2011 | 1 PM - 4 PM
Exploratory Network Analysis with Gephi


This tutorial is an introduction to Gephi, the open source graph network
visualization and manipulation software.

Gephi aims to fulfill the complete chain from data importing to aesthetics
refinements and interaction.

Users interact with the visualization and manipulate structures, shapes
and colors to reveal hidden properties.

The goal is to help data analysts to make hypotheses, intuitively discover
patterns or errors in large data collections.




                                                                                E
At the end, the participants will walk away with the practical knowledge




                                                                             IN
enabling them to use Gephi for their own projects.




                                                                    F F L
                                                               O
Exploratory Network Analysis with Gephi


It starts with a brief introduction on the network exploration process and
a hands-on demonstration of the essential functionalities of Gephi.

Participants are guided step by step through the complete chain of rep-
resentation, manipulation, layout, analysis and aesthetics refinements.
Next, teams work on real datasets.

They finally present their preliminary results. The tutorial concludes with
a general question and answer session.




                                                                              IN E
                                                                     F F L
                                                                O
Requirements


Bring your own laptop with Java and Gephi installed.
Gephi should be updated (menu Help > Check for Updates).

Bring a mouse with a wheel.

Bring a dataset of your own if you want, verify if it loads well in Gephi.[1]




[1] http://gephi.org/users/supported-graph-formats/
Workshop Schedule - Part I


Exploratory Network Analysis

•	Exploratory Data Analysis
•	Exploratory Network Analysis
•	Looking for Orderness in Data
•	Examples
•	Guideline

Introduction to Gephi

•	Approach and Community
•	Networked Data
•	Quick Start Demo

        * 30 min break *
Workshop Schedule - Part II


Hands-On!

•	Team Work on a Dataset
•	Presentation of Preliminary Results

Q&A
Exploratory Data Analysis




   Confirmatory                   results
    Exploratory                   intuition
    Serendipity                   surprise


 “The greatest value of a picture is when it forces us        started with
     to notice what we never expected to see”            John Tukey (1962)
Exploratory Data Analysis




                                 Non-linear processing chain of Ben Fry
                            in Computational Information Design (2004)
Dummy Example


                                                    Observation:
                                                    visual saliences on specific
                                                    file sizes

                                                    External knowledge:
                                                    these sizes correspond to
                                                    films

                                                    New hypothesis on data:
                                                    films are highly exchanged,
                                                    so the study might dig in
                                                    this direction
 P2P file size distribution (Latapy et al., 2008)
Exploratory Network Analysis




                                      2   interact in real time
      1    see the network
                                     Gephi prototype (2008)
  1st graph viz tool: Pajek (1996)   group, filter, compute metrics...
  Vladimir Batagelj, Andrej Mrvar


  3       build a visual language
 size by rank, color by partition,
 label, curved edges, thickness...
Looking for a “Simple Small Truth”?




Drew Conway, What Data Visualization Should Do:     1. Make complex things simple
                                                    2. Extract small information from large data
                                                    3. Present truth, do not deceive
                                             http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/
Looking for Orderness in Data


        Make varying 3 cursors simultaneously to extract
                      meaningful patterns


MICRO level         MACRO level
                                      at different levels


1 dimension         N dimensions
                                      on multiple dimensions


T+0                         T+N
                                      at time scale
“Zoom” cursor on Quantitative Data

MICRO level   MACRO level

                            Global
                            - connectivity
                            - density
                            - centralization

                            Local
                            - communities
                            - bridges between communities
                            - local centers vs periphery

                            Individual
                            - centrality
                            - distances
                            - neighborhood
                            - location
                            - local authority vs hub
“Crossing” cursor on Qualitative Data

1 dimension           N dimensions


Social
- who with whom
- communities
- brokerage
- influence and power
- homophily

Semantic
- topics
- thematic clusters

Geographic
- spatial phenomena
“Timeline” cursor on Temporal Data

T+0                        T+N




Evolution of social ties

Evolution of communities

Evolution of topics
Mapping an Innovation Center
Collaborations on projects at Images et Réseaux



                                     Themes and content




                                     Actors




                                     Territory


                     Franck Ghitalla & Ecole de Design de Nantes
Mapping Scientific Cooperations
Network Map: a Series of Choices

 corpus
          data
                           graphical
                           operations




algorithms
                              communication
           thresholds         goals
Guideline

   # nodes
    1 - 100      lists + edges in bonus, focus on qualitative data


                 How attributes explain the structure?
 100 - 1,000     •	easy to read, “obvious” patterns
                 •	focus on entities (in context)
                 •	metrics are tools to describe the graph (centrality, bridging...)
                 •	links help to build and interpret categories of entities
                 challenge: mix attribute crossing and connectivity

                 How the structure explains attributes?
1,000 - 50,000 •	hard to read, problem of “hidden signals”:
                 track patterns with various layouts and filtering
               •	focus on structures
               •	metrics are tools to build the graph (cosine similarity...)
               •	categories help to understand the structure
               challenge: pattern recognition

   > 50,000      require high computational power
Gephi now!
Gephi in a Nutshell


                « Like Photoshop™ for graphs. »

   Helps data analysts to reveal patterns and trends,
    highlight outliers and tells story with their data.


•	Network visualization platform
•	Open source, supported by a community
•	Built for performance and usability
•	Extensible by plug-ins
•	Windows, MacOS X, Linux
Gephi Community




                  Nonprofit organization




  Communities     Contributors
                  Mathieu Bastian, Mathieu Jacomy,
                  Eduardo Ramos Ibañez, Sébastien
                  Heymann, Guillaume Ceccarelli,
                  André Panisson, Antonio Patriarca,
                  Cezary Bartosiak, Martin Škurla,
                  Patrick McSweeney, Yi Du, Hélder
                  Suzuki, Daniel Bernardes, Ernesto
                  Aneiro, Keheliya Gallaba, Luiz
                  Ribeiro, Urban Škudnik, Vojtech
                  Bardiovsky, Yudi Xue
Community Mission


         Provide a “sustainable” software

         Maintain the technical ecosystem

            Build a business ecosystem

  Face cutting-edge technological challenges with
                a long-term vision

      Distribute the software in Open Source
Community Values


  Open innovation: ideas and features come from
             the entire community.

      Decisions are taken with transparency.

   We consider this technology as a public good,
         and will keep it in open source.
Diversity of Usages

business              leisure :-)




communication         academic      art
Diversity of Network Encoding


V = { a, b, c, d, e }                                  <graph>
E = { (a,b), (a,d), (b,c), (e,a), (c,e) }                   <nodes>
                                                               <node id=”a” />
                                                               <node id=”b” />
                   Textual                                     <node id=”c” />
                                                               <node id=”d” />
                                                               <node id=”e” />
                                                            </nodes>
                                                            <edges>
                                                               <edge source=”a” target=”b” />
                                                               <edge source=”a” target=”d” />
           a   b   c   d   e                                   <edge source=”b” target=”c” />
       a   -   1   -   1   -                                   <edge source=”e” target=”a” />
                                                               <edge source=”c” target=”e” />
       b   -   -   1   -   -
                                                            </edges>
       c   -   -   -   -   1                           </graph>
       d   -   -   -   -   -
       e   1   -   -   -   -                                            XML
                                        Graphical
           Tabular

                                                    and many others...
Software I/O




                             }
    MySQL
 PostgreSL
SQL Server
                databases        user input
    Neo4j

             CSV                                  CSV
             Pajek NET                            Pajek NET     file
             Guess GDF                            Guess GDF


                                              >
             GEXF                                 GEXF
             GraphML                              GraphML
   file      Graphviz DOT                         Excel Spreadsheet
             UCInet DL                            SVG
             NetdrawVNA                           PDF
             Tulip TLP                            PNG
             Excel Spreadsheet



 graph streaming
Choosing a File Format




                                re




                                                              es


                                                                       e
                               tu




                                                                    lu
                                                             ut
                                c




                                                                  Va
                             ru




                                                                            s
                                                          rib




                                                                         ph
                          St




                                                                lt
                                                          t




                                                                       ra
                                                       At

                                                              au
                         rix




                                                                     G
                               re




                                                            ef
                                                      n
                                      t
                     at




                                    gh




                                                                  al
                                                  io
                           tu




                                                            D
                    /M




                                                                  ic
                                           es




                                                                        s
                                                  at
                                 ei
                          ru




                                                                       ic
                                                         e

                                                                 h
                    st




                                       ut

                                                liz
                                W




                                                       ut




                                                                     am
                                                              rc
                         St
                Li




                                     rib




                                                      rib
                                            ua




                                                            ra
                               ge
                     L




                                                                  yn
               ge




                                                            ie
                    XM




                                              s
                                       t




                                                     t
                           Ed

                                    At




                                                  At
                                           Vi




                                                                  D
                                                          H
               Ed




CSV                                                                         Table of features supported
DL Ucinet                                                                   by Gephi
DOT Graphviz
GDF
GEXF
                                                                            * spreadsheets can be loaded
GML                                                                         in the Data Laboratory
GraphML
NET Pajek
TLP Tulip
VNA Netdraw
Spreadsheet*
Do you need...


                     Many features
          GEXF
          Spreadsheet
          GraphML
          Guess GDF
          GML
          UCINet DL
          Netdraw VNA
          Graphviz DOT
          Pajek NET                     File Type
          CSV                               XML
          Tulip TLP                         Tabular
                         Few features       Text
Using Gephi




               E M O
              D
Team work




 1   Create a team of 2~3 people.


 2   Choose a dataset.


 3   Explore it during 1H.


 4   Two teams present their preliminary findings.
Dataset #1: GitHub Software Repository




 “GitHub is an application used by nearly a million people to store
 over two million code repositories, making GitHub the largest code
                         host in the world.”

Started in 2008, it provides the features of an online social network
and a software repository to lower the barriers of collaboration and
make the code easier to contribute.

                                                 https://github.com
Dataset #1: GitHub Software Repository


Data extracted by Franck Cuny* at Linkfluence SAS

1st release in March 2010 -> this poster
2nd release in June 2011 -> your data

_____________Network of user profiles__________

Nodes: peoples with at least one repository who
are followed by at least two other people
Edges: A follows B

_____________Network of repositories__________

Nodes: repositories
Edges: A shares a developer with B

        Very few research publications on this OSN!

                                                      * franck.cuny@linkfluence.net
Dataset #1: GitHub Software Repository


Data extracted by a crawl using the GitHub API
Seed: 10 well-known contributors in the Perl community

Networks by country: Japan, France, United States
Networks by language: Perl, PHP, Python, Ruby

Node attributes:
•	user country
•	number of followers
•	main programming language

Edges:
•	directed
•	weight = number of projects A has forked from B
Dataset #1: GitHub Software Repository




         Your mission (should you decide to accept it):
      find research hypotheses based on your exploration

  Example question: are the Perl communities based on geography?
Dataset #2: The Irish Blogosphere


“Identifying Representative Textual Sources in Blog Networks”. K. Wade, D.
Greene, C. Lee, D. Archambault, P. Cunningham (2011) http://mlg.ucd.ie/blogs



_______________Blogroll Network______________

Nodes: blogs with more than two blogroll links
Edges: blogroll link (in-link)

_______________Post-link Network_____________

Nodes: blogs with more than two blogroll links
Edges: hyperlink inside post from a blog to another
(post-link)
Dataset #2: The Irish Blogosphere


Data extracted by a crawl at distance 2 from the seed for the in-links
and Google Blog Search for the post-links.
Seed: 21 popular blogs, winners of the “2010 Irish Blog Awards”

Node attributes:
•	post count = total number of posts by blog
•	category = from the irish blog index at www.irishblogdirectory.com,
  where available
•	infomap_comm = community to which a node belongs (infomap algo)
•	gce_comms = overlapping communities (GCE algo)
•	moses_comms = overlapping communities (MOSES algo)

Edges:
•	directed
•	weight = number of hyperlinks in the Post-link network
                                                            crawl at distance 2 from the seed
Dataset #2: The Irish Blogosphere




                       Your mission:
       explore and try to confirm the official results
Hands-On!


Start:

•	Load a graph
•	Apply a layout
•	Color the nodes by a qualitative variable in Partition Panel
•	Size the nodes by a quantitative variable in Ranking Panel
•	Start to explore...compute metrics, filter the network

End:

•	Export maps to PDF in Preview Tab
•	Save
Presentations




  GitHub Repository   Irish Blogosphere
Gephi Documentation


Web Site:       http://gephi.org

Support:        http://forum.gephi.org
Wiki:           http://wiki.gephi.org
Source code:    https://launchpad.net/gephi


Online Tutorials
http://gephi.org/users/quick-start/
http://gephi.org/users/tutorial-visualization/
http://gephi.org/users/tutorial-layouts/
http://wiki.gephi.org/index.php/Import_CSV_Data
http://wiki.gephi.org/index.php/Import_Dynamic_Data


Tutorial in Spanish
https://code.google.com/p/camon/wiki/Taller_Gephi


Supported Graph Formats
http://gephi.org/users/supported-graph-formats/
Thank You!




             Caspar David Friedrich -
             Wanderer Above the Sea of Fog
Credits


[slide 11] images from Drew Conway
http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/

[slide 22 top left] Benoît Vidal at MFG Labs
[slide 22 bottom center] Franck Ghitalla at UTC
[slide 22 right] Studies in MA Digital Fashion at LCF by Peter Jeun Ho Tsang
http://jeunhotsang.com/blog/2010/12/07/prototype/

[slide 27] sketches from Ben Fry, Computational Information Design



           Special Thanks to Franck Ghitalla and Mathieu Jacomy
                         for their insightful discussions.

Mais conteúdo relacionado

Destaque (20)

край де варто жити
край де варто житикрай де варто жити
край де варто жити
 
Excellent Cities For Young Entrepreneurs in 2017 | Jerry Novack
Excellent Cities For Young Entrepreneurs in 2017 | Jerry NovackExcellent Cities For Young Entrepreneurs in 2017 | Jerry Novack
Excellent Cities For Young Entrepreneurs in 2017 | Jerry Novack
 
Grupo restaurante
Grupo restauranteGrupo restaurante
Grupo restaurante
 
NIKHIL LAZARUS (BBA) RESUME
NIKHIL LAZARUS (BBA) RESUMENIKHIL LAZARUS (BBA) RESUME
NIKHIL LAZARUS (BBA) RESUME
 
Morocco mgf - open data
Morocco mgf - open dataMorocco mgf - open data
Morocco mgf - open data
 
Enciclopedia de los diferentes comandos de personalizacion de diapositivas p...
Enciclopedia  de los diferentes comandos de personalizacion de diapositivas p...Enciclopedia  de los diferentes comandos de personalizacion de diapositivas p...
Enciclopedia de los diferentes comandos de personalizacion de diapositivas p...
 
Java EE7: Developing for the Cloud
Java EE7: Developing for the CloudJava EE7: Developing for the Cloud
Java EE7: Developing for the Cloud
 
Logosc
LogoscLogosc
Logosc
 
Grupo restaurante
Grupo restauranteGrupo restaurante
Grupo restaurante
 
BEEP Ofertas Marzo 2014
BEEP Ofertas Marzo  2014BEEP Ofertas Marzo  2014
BEEP Ofertas Marzo 2014
 
Enano newsletter issue20-21
Enano newsletter issue20-21Enano newsletter issue20-21
Enano newsletter issue20-21
 
THE SEASONAL EFFECT
THE SEASONAL EFFECTTHE SEASONAL EFFECT
THE SEASONAL EFFECT
 
Photo story
Photo storyPhoto story
Photo story
 
20101023 mind mapping
20101023   mind mapping20101023   mind mapping
20101023 mind mapping
 
Ismawati
IsmawatiIsmawati
Ismawati
 
RBG COM
RBG COMRBG COM
RBG COM
 
AID Pordenone - Livescribe Smartpen
AID Pordenone - Livescribe SmartpenAID Pordenone - Livescribe Smartpen
AID Pordenone - Livescribe Smartpen
 
Internship Report
Internship ReportInternship Report
Internship Report
 
D7
D7D7
D7
 
Qatar open data
Qatar open dataQatar open data
Qatar open data
 

Semelhante a Gephi icwsm-tutorial

SP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiSP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiJohn Breslin
 
Graph visualization options and latest developments
Graph visualization options and latest developmentsGraph visualization options and latest developments
Graph visualization options and latest developmentsLinkurious
 
Trends in Human-Computer Interaction in Information Seeking
Trends in Human-Computer Interaction in Information SeekingTrends in Human-Computer Interaction in Information Seeking
Trends in Human-Computer Interaction in Information SeekingRich Miller
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smithMarc Smith
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLocal Social Summit
 
Gephi short introduction
Gephi short introductionGephi short introduction
Gephi short introductionSébastien
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCFlorian Stegmaier
 
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...i_scienceEU
 
Mining Social Graph Data
Mining Social Graph DataMining Social Graph Data
Mining Social Graph DataDrew Conway
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013CS, NcState
 
Geographic Information Systems and Social Learning in Participatory Spatial P...
Geographic Information Systems and Social Learning in Participatory Spatial P...Geographic Information Systems and Social Learning in Participatory Spatial P...
Geographic Information Systems and Social Learning in Participatory Spatial P...Robert Goodspeed
 
Social Event Detection using Multimodal Clustering and Integrating Supervisor...
Social Event Detection using Multimodal Clustering and Integrating Supervisor...Social Event Detection using Multimodal Clustering and Integrating Supervisor...
Social Event Detection using Multimodal Clustering and Integrating Supervisor...Symeon Papadopoulos
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software AnalyticsMargaret-Anne Storey
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...Marc Smith
 
Towards the Intelligent Internet of Everything
Towards the Intelligent Internet of EverythingTowards the Intelligent Internet of Everything
Towards the Intelligent Internet of EverythingRECAP Project
 

Semelhante a Gephi icwsm-tutorial (20)

SP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with GephiSP1: Exploratory Network Analysis with Gephi
SP1: Exploratory Network Analysis with Gephi
 
STI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked dataSTI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked data
 
Graph visualization options and latest developments
Graph visualization options and latest developmentsGraph visualization options and latest developments
Graph visualization options and latest developments
 
Trends in Human-Computer Interaction in Information Seeking
Trends in Human-Computer Interaction in Information SeekingTrends in Human-Computer Interaction in Information Seeking
Trends in Human-Computer Interaction in Information Seeking
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smith
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social Media
 
Benoit Visual Only Retrieval
Benoit Visual Only RetrievalBenoit Visual Only Retrieval
Benoit Visual Only Retrieval
 
Blended Libraries (Harald Reiterer)
Blended Libraries (Harald Reiterer)Blended Libraries (Harald Reiterer)
Blended Libraries (Harald Reiterer)
 
Gephi short introduction
Gephi short introductionGephi short introduction
Gephi short introduction
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBC
 
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
 
Mining Social Graph Data
Mining Social Graph DataMining Social Graph Data
Mining Social Graph Data
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
Geographic Information Systems and Social Learning in Participatory Spatial P...
Geographic Information Systems and Social Learning in Participatory Spatial P...Geographic Information Systems and Social Learning in Participatory Spatial P...
Geographic Information Systems and Social Learning in Participatory Spatial P...
 
Social Event Detection using Multimodal Clustering and Integrating Supervisor...
Social Event Detection using Multimodal Clustering and Integrating Supervisor...Social Event Detection using Multimodal Clustering and Integrating Supervisor...
Social Event Detection using Multimodal Clustering and Integrating Supervisor...
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionality
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...
 
Towards the Intelligent Internet of Everything
Towards the Intelligent Internet of EverythingTowards the Intelligent Internet of Everything
Towards the Intelligent Internet of Everything
 

Mais de csedays

Ladamic intro-networks
Ladamic intro-networksLadamic intro-networks
Ladamic intro-networkscsedays
 
Triangle counting handout
Triangle counting handoutTriangle counting handout
Triangle counting handoutcsedays
 
Hyper an fandfb-handout
Hyper an fandfb-handoutHyper an fandfb-handout
Hyper an fandfb-handoutcsedays
 
Graphcompression handout
Graphcompression handoutGraphcompression handout
Graphcompression handoutcsedays
 
Largedictionaries handout
Largedictionaries handoutLargedictionaries handout
Largedictionaries handoutcsedays
 
Linkanalysis handout
Linkanalysis handoutLinkanalysis handout
Linkanalysis handoutcsedays
 
лекция райгородский слайды версия 1.1
лекция райгородский слайды версия 1.1лекция райгородский слайды версия 1.1
лекция райгородский слайды версия 1.1csedays
 

Mais de csedays (7)

Ladamic intro-networks
Ladamic intro-networksLadamic intro-networks
Ladamic intro-networks
 
Triangle counting handout
Triangle counting handoutTriangle counting handout
Triangle counting handout
 
Hyper an fandfb-handout
Hyper an fandfb-handoutHyper an fandfb-handout
Hyper an fandfb-handout
 
Graphcompression handout
Graphcompression handoutGraphcompression handout
Graphcompression handout
 
Largedictionaries handout
Largedictionaries handoutLargedictionaries handout
Largedictionaries handout
 
Linkanalysis handout
Linkanalysis handoutLinkanalysis handout
Linkanalysis handout
 
лекция райгородский слайды версия 1.1
лекция райгородский слайды версия 1.1лекция райгородский слайды версия 1.1
лекция райгородский слайды версия 1.1
 

Gephi icwsm-tutorial

  • 1. ICWSM’11 Tutorial Exploratory Network Analysis with: Instructors: Sébastien Heymann, Julian Bilcke seb@gephi.org, julian.bilcke@gephi.org July 17, 2011 | 1 PM - 4 PM
  • 2. Exploratory Network Analysis with Gephi This tutorial is an introduction to Gephi, the open source graph network visualization and manipulation software. Gephi aims to fulfill the complete chain from data importing to aesthetics refinements and interaction. Users interact with the visualization and manipulate structures, shapes and colors to reveal hidden properties. The goal is to help data analysts to make hypotheses, intuitively discover patterns or errors in large data collections. E At the end, the participants will walk away with the practical knowledge IN enabling them to use Gephi for their own projects. F F L O
  • 3. Exploratory Network Analysis with Gephi It starts with a brief introduction on the network exploration process and a hands-on demonstration of the essential functionalities of Gephi. Participants are guided step by step through the complete chain of rep- resentation, manipulation, layout, analysis and aesthetics refinements. Next, teams work on real datasets. They finally present their preliminary results. The tutorial concludes with a general question and answer session. IN E F F L O
  • 4. Requirements Bring your own laptop with Java and Gephi installed. Gephi should be updated (menu Help > Check for Updates). Bring a mouse with a wheel. Bring a dataset of your own if you want, verify if it loads well in Gephi.[1] [1] http://gephi.org/users/supported-graph-formats/
  • 5. Workshop Schedule - Part I Exploratory Network Analysis • Exploratory Data Analysis • Exploratory Network Analysis • Looking for Orderness in Data • Examples • Guideline Introduction to Gephi • Approach and Community • Networked Data • Quick Start Demo * 30 min break *
  • 6. Workshop Schedule - Part II Hands-On! • Team Work on a Dataset • Presentation of Preliminary Results Q&A
  • 7. Exploratory Data Analysis Confirmatory results Exploratory intuition Serendipity surprise “The greatest value of a picture is when it forces us started with to notice what we never expected to see” John Tukey (1962)
  • 8. Exploratory Data Analysis Non-linear processing chain of Ben Fry in Computational Information Design (2004)
  • 9. Dummy Example Observation: visual saliences on specific file sizes External knowledge: these sizes correspond to films New hypothesis on data: films are highly exchanged, so the study might dig in this direction P2P file size distribution (Latapy et al., 2008)
  • 10. Exploratory Network Analysis 2 interact in real time 1 see the network Gephi prototype (2008) 1st graph viz tool: Pajek (1996) group, filter, compute metrics... Vladimir Batagelj, Andrej Mrvar 3 build a visual language size by rank, color by partition, label, curved edges, thickness...
  • 11. Looking for a “Simple Small Truth”? Drew Conway, What Data Visualization Should Do: 1. Make complex things simple 2. Extract small information from large data 3. Present truth, do not deceive http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/
  • 12. Looking for Orderness in Data Make varying 3 cursors simultaneously to extract meaningful patterns MICRO level MACRO level at different levels 1 dimension N dimensions on multiple dimensions T+0 T+N at time scale
  • 13. “Zoom” cursor on Quantitative Data MICRO level MACRO level Global - connectivity - density - centralization Local - communities - bridges between communities - local centers vs periphery Individual - centrality - distances - neighborhood - location - local authority vs hub
  • 14. “Crossing” cursor on Qualitative Data 1 dimension N dimensions Social - who with whom - communities - brokerage - influence and power - homophily Semantic - topics - thematic clusters Geographic - spatial phenomena
  • 15. “Timeline” cursor on Temporal Data T+0 T+N Evolution of social ties Evolution of communities Evolution of topics
  • 16. Mapping an Innovation Center Collaborations on projects at Images et Réseaux Themes and content Actors Territory Franck Ghitalla & Ecole de Design de Nantes
  • 18. Network Map: a Series of Choices corpus data graphical operations algorithms communication thresholds goals
  • 19. Guideline # nodes 1 - 100 lists + edges in bonus, focus on qualitative data How attributes explain the structure? 100 - 1,000 • easy to read, “obvious” patterns • focus on entities (in context) • metrics are tools to describe the graph (centrality, bridging...) • links help to build and interpret categories of entities challenge: mix attribute crossing and connectivity How the structure explains attributes? 1,000 - 50,000 • hard to read, problem of “hidden signals”: track patterns with various layouts and filtering • focus on structures • metrics are tools to build the graph (cosine similarity...) • categories help to understand the structure challenge: pattern recognition > 50,000 require high computational power
  • 21. Gephi in a Nutshell « Like Photoshop™ for graphs. » Helps data analysts to reveal patterns and trends, highlight outliers and tells story with their data. • Network visualization platform • Open source, supported by a community • Built for performance and usability • Extensible by plug-ins • Windows, MacOS X, Linux
  • 22. Gephi Community Nonprofit organization Communities Contributors Mathieu Bastian, Mathieu Jacomy, Eduardo Ramos Ibañez, Sébastien Heymann, Guillaume Ceccarelli, André Panisson, Antonio Patriarca, Cezary Bartosiak, Martin Škurla, Patrick McSweeney, Yi Du, Hélder Suzuki, Daniel Bernardes, Ernesto Aneiro, Keheliya Gallaba, Luiz Ribeiro, Urban Škudnik, Vojtech Bardiovsky, Yudi Xue
  • 23. Community Mission Provide a “sustainable” software Maintain the technical ecosystem Build a business ecosystem Face cutting-edge technological challenges with a long-term vision Distribute the software in Open Source
  • 24. Community Values Open innovation: ideas and features come from the entire community. Decisions are taken with transparency. We consider this technology as a public good, and will keep it in open source.
  • 25. Diversity of Usages business leisure :-) communication academic art
  • 26. Diversity of Network Encoding V = { a, b, c, d, e } <graph> E = { (a,b), (a,d), (b,c), (e,a), (c,e) } <nodes> <node id=”a” /> <node id=”b” /> Textual <node id=”c” /> <node id=”d” /> <node id=”e” /> </nodes> <edges> <edge source=”a” target=”b” /> <edge source=”a” target=”d” /> a b c d e <edge source=”b” target=”c” /> a - 1 - 1 - <edge source=”e” target=”a” /> <edge source=”c” target=”e” /> b - - 1 - - </edges> c - - - - 1 </graph> d - - - - - e 1 - - - - XML Graphical Tabular and many others...
  • 27. Software I/O } MySQL PostgreSL SQL Server databases user input Neo4j CSV CSV Pajek NET Pajek NET file Guess GDF Guess GDF > GEXF GEXF GraphML GraphML file Graphviz DOT Excel Spreadsheet UCInet DL SVG NetdrawVNA PDF Tulip TLP PNG Excel Spreadsheet graph streaming
  • 28. Choosing a File Format re es e tu lu ut c Va ru s rib ph St lt t ra At au rix G re ef n t at gh al io tu D /M ic es s at ei ru ic e h st ut liz W ut am rc St Li rib rib ua ra ge L yn ge ie XM s t t Ed At At Vi D H Ed CSV Table of features supported DL Ucinet by Gephi DOT Graphviz GDF GEXF * spreadsheets can be loaded GML in the Data Laboratory GraphML NET Pajek TLP Tulip VNA Netdraw Spreadsheet*
  • 29. Do you need... Many features GEXF Spreadsheet GraphML Guess GDF GML UCINet DL Netdraw VNA Graphviz DOT Pajek NET File Type CSV XML Tulip TLP Tabular Few features Text
  • 30. Using Gephi E M O D
  • 31. Team work 1 Create a team of 2~3 people. 2 Choose a dataset. 3 Explore it during 1H. 4 Two teams present their preliminary findings.
  • 32. Dataset #1: GitHub Software Repository “GitHub is an application used by nearly a million people to store over two million code repositories, making GitHub the largest code host in the world.” Started in 2008, it provides the features of an online social network and a software repository to lower the barriers of collaboration and make the code easier to contribute. https://github.com
  • 33. Dataset #1: GitHub Software Repository Data extracted by Franck Cuny* at Linkfluence SAS 1st release in March 2010 -> this poster 2nd release in June 2011 -> your data _____________Network of user profiles__________ Nodes: peoples with at least one repository who are followed by at least two other people Edges: A follows B _____________Network of repositories__________ Nodes: repositories Edges: A shares a developer with B Very few research publications on this OSN! * franck.cuny@linkfluence.net
  • 34. Dataset #1: GitHub Software Repository Data extracted by a crawl using the GitHub API Seed: 10 well-known contributors in the Perl community Networks by country: Japan, France, United States Networks by language: Perl, PHP, Python, Ruby Node attributes: • user country • number of followers • main programming language Edges: • directed • weight = number of projects A has forked from B
  • 35. Dataset #1: GitHub Software Repository Your mission (should you decide to accept it): find research hypotheses based on your exploration Example question: are the Perl communities based on geography?
  • 36. Dataset #2: The Irish Blogosphere “Identifying Representative Textual Sources in Blog Networks”. K. Wade, D. Greene, C. Lee, D. Archambault, P. Cunningham (2011) http://mlg.ucd.ie/blogs _______________Blogroll Network______________ Nodes: blogs with more than two blogroll links Edges: blogroll link (in-link) _______________Post-link Network_____________ Nodes: blogs with more than two blogroll links Edges: hyperlink inside post from a blog to another (post-link)
  • 37. Dataset #2: The Irish Blogosphere Data extracted by a crawl at distance 2 from the seed for the in-links and Google Blog Search for the post-links. Seed: 21 popular blogs, winners of the “2010 Irish Blog Awards” Node attributes: • post count = total number of posts by blog • category = from the irish blog index at www.irishblogdirectory.com, where available • infomap_comm = community to which a node belongs (infomap algo) • gce_comms = overlapping communities (GCE algo) • moses_comms = overlapping communities (MOSES algo) Edges: • directed • weight = number of hyperlinks in the Post-link network crawl at distance 2 from the seed
  • 38. Dataset #2: The Irish Blogosphere Your mission: explore and try to confirm the official results
  • 39. Hands-On! Start: • Load a graph • Apply a layout • Color the nodes by a qualitative variable in Partition Panel • Size the nodes by a quantitative variable in Ranking Panel • Start to explore...compute metrics, filter the network End: • Export maps to PDF in Preview Tab • Save
  • 40. Presentations GitHub Repository Irish Blogosphere
  • 41. Gephi Documentation Web Site: http://gephi.org Support: http://forum.gephi.org Wiki: http://wiki.gephi.org Source code: https://launchpad.net/gephi Online Tutorials http://gephi.org/users/quick-start/ http://gephi.org/users/tutorial-visualization/ http://gephi.org/users/tutorial-layouts/ http://wiki.gephi.org/index.php/Import_CSV_Data http://wiki.gephi.org/index.php/Import_Dynamic_Data Tutorial in Spanish https://code.google.com/p/camon/wiki/Taller_Gephi Supported Graph Formats http://gephi.org/users/supported-graph-formats/
  • 42. Thank You! Caspar David Friedrich - Wanderer Above the Sea of Fog
  • 43. Credits [slide 11] images from Drew Conway http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/ [slide 22 top left] Benoît Vidal at MFG Labs [slide 22 bottom center] Franck Ghitalla at UTC [slide 22 right] Studies in MA Digital Fashion at LCF by Peter Jeun Ho Tsang http://jeunhotsang.com/blog/2010/12/07/prototype/ [slide 27] sketches from Ben Fry, Computational Information Design Special Thanks to Franck Ghitalla and Mathieu Jacomy for their insightful discussions.