Call graph construction is the foundation of inter-procedural static analysis. However, constructing precise call graphs for Python programs while maintaining high efficiency remains a significant challenge. For instance, the state-of-the-art approach, PyCG, fails to scale to large programs. In our preliminary experiments, it ran out of memory or exceeded the time limit for programs exceeding 2,000 lines of code. This limitation stems from the costly global fixed-point iterations required during analysis. In addition, PyCG is flow-insensitive and does not fully support Python’s dynamic features, which further limits its accuracy.
To overcome these drawbacks, we propose a scalable and precise approach for constructing application-centered call graphs for Python programs, and implement it as a prototype Jarvis. Jarvis maintains a type graph (i.e., type relations of program identifiers) for each function in a program to allow type inference.x Taking one function as an input, Jarvis generates the call graph on-the-fly, where flowsensitive intraprocedural analysis and interprocedural analysis are conducted in turn and strong updates are conducted. Unlike traditional whole-program analyses (eg., PyCG) that rely on costly global fixed-point iterations, Jarvis constructs call graphs on-the-fly using function-scoped type graphs. By propagating type and call information in a single pass and reusing function-level type relations, Jarvis achieves precise, flow-sensitive call graph construction without repeated whole-program iterations. Our evaluation on a micro-benchmark of 135 small Python programs and a macro-benchmark of 6 real-world Python applications has demonstrated that Jarvis can significantly improve PyCG by at least 67% faster in time, 84% higher in precision, and at least 20% higher in recall
The paper has been submitted to TOSEM. The Jarvis artifact is provided here.
The micro-benchmark and macro-benchmark are provided in dataset and grount_truth directory.
Prerequisites:
- Python = 3.8
- PyCG: tool/PyCG
- Jarvis: tool/Jarvis
run jarvis_cli.py.
Jarvis usage:
$ python3 tool/Jarvis/jarvis_cli.py [module_path1 module_path2 module_path3...] [--package] [--decy] [-o output_path]Jarvis help:
$ python3 tool/Jarvis/jarvis_cli.py -h
usage: jarvis_cli.py [-h] [--package PACKAGE] [--decy] [--precision]
[--moduleEntry [MODULEENTRY ...]]
[--operation {call-graph,key-error}] [-o OUTPUT]
[module ...]
positional arguments:
module modules to be processed, which are also application entries in A.W. mode
options:
-h, --help show this help message and exit
--package PACKAGE Package containing the code to be analyzed
--decy whether analyze the dependencies
--precision whether flow-sensitive
--entry-point [MODULEENTRY ...]
Entry functions to be processed
-o OUTPUT, --output OUTPUT
Output call graph pathExample 1: analyze bpytop.py in E.A. mode.
$ python3 tool/Jarvis/jarvis_cli.py dataset/macro_benchmark/pj/bpytop/bpytop.py --package dataset/macro_benchmark/pj/bpytop -o jarvis.jsonExample 2: analyze bpytop.py in A.W. mode. Note we should prepare all the dependencies in the virtual environment.
# create virtualenv environment
$ virtualenv venv python=python3.8
# install Dependencies in virtualenv environment
$ python3 -m pip install psutil
# run jarvis
$ python3 tool/Jarvis/jarvis_cli.py dataset/macro_benchmark/pj/bpytop/bpytop.py --package dataset/macro_benchmark/pj/bpytop --decy -o jarvis.jsoncd to the root directory of the unzipped files.
# 1. run micro_benchmark
$ ./reproducing_RQ12_setup/micro_benchmark/test_All.sh
# 2. run macro_benchmark
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EA.sh
# PyCG iterates once
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EW.sh 1
# PyCG iterates twice
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EW.sh 2
# PyCG iterates to convergence
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EW.sh
$ ./reproducing_RQ12_setup/macro_benchmark/jarvis_AA.sh
$ ./reproducing_RQ12_setup/macro_benchmark/jarvis_EA.sh
$ ./reproducing_RQ12_setup/macro_benchmark/jarvis_AW.shRun
$ python3 ./reproducing_RQ1/gen_table.pyThe results are shown below:
Run
$ pip3 install matplotlib
$ pip3 install numpy
$ python3 ./reproducing_RQ1/FTG/plot.pyThe generated graphs are pycg-ag.pdf, pycg-change-ag.pdf and jarvis-ftg.pdf, where they represents Fig. 9a, Fig. 9b and Fig 10, correspondingly.
Run
$ python3 ./reproducing_RQ2/gen_table.py The generated results:
Scalability results (RQ1), AE denotes AssertionError:
Accuracy results (RQ2):
The 43 python projects out of the top 200 Highly-starred projects are listed in file
Fastapi, Httpie, Scrapy, Lightning, Airflow,sherlock,wagtail
Html: CVE-2018-17142(Golang)- cryptography: CVE-2016-9243, CVE-2020-36242, CVE-2018-10903
- urllib3: CVE-2021-33503, CVE-2019-11324, CVE-2019-11236, CVE-2020-7212
- requests: CVE-2014-1830, CVE-2015-2296, CVE-2018-18074
psutil: CVE-2019-18874(C)Numpy: CVE-2021-33430, CVE-2014-1858, CVE-2014-1859, CVE-2017-12852(cpp)lxml: CVE-2021-28957, CVE-2018-19787, CVE-2020-27783, CVE-2014-3146(js)- jinja2 : CVE-2020-28493, CVE-2014-0012, CVE-2014-1402
- sqlalchemy : CVE-2019-7164, CVE-2019-7548
- httpx: CVE-2021-41945
The CVEs of html , numpy , lxml,psutil don't relate to Python , we don't care them.
- sherlock.sherlock
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- sherlock.sites
- requests(v.2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.kubernetes.kube_client
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.providers.cncf.kubernetes.operators.pod
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.providers.cncf.kubernetes.utils.pot_manager
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.executors.kubernetes_executor
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
......
- wagtail.contrib.frontent_cache.backends
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- httpie.client
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- httpie.ssl_
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- httpie.models
- urllib3(1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- scrapy.downloadermiddlewares.cookies
- tldextract(v3.4.4)
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- lightning.app.utilities.network
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- lightning.app.utilities.network
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- lightning.app.utilities.network
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
...
According to the patch commit, the vulnerable method of CVE-2021-33503 in urllib3 is urllib3.util.url.
Below is the method-level invocation path:
- httpie.apapters.<main>
- requests.adapters.<main>
- urllib3.contrib.socks.<main>
- Urllib3.util.url.<main> ---- CVE-2021-33503
- scrapy.downloadermiddlewares.cookies.<main>
- tldextract.__init__.<main>
- tldextract.tldextract.<main>
- tldextract.suffix_list.<main>
- requests_file.<main>
- requests.adapters.<main>
- Urllib3.util.url.<main> ---- CVE-2021-33503
- lightning.app.utilities.network.<main>
- requests.adapters.<main>
- urllib3.contrib.socks.<main>
- Urllib3.util.url.<main> ---- CVE-2021-33503
- airflow.providers.amazon.aws.hooks.base_aws.BaseSessionFactory._get_idp_response
- requests.adapters.<main>
- urllib3.contrib.sock.<main>
- urllib3.util.url.<main> ---- CVE-2021-33503
PS:
represents body code block of python file.(Because python doesn't need entry function)Our artifact has reused part of the functionalities from third party libraries. i.e., PyCG.
Vitalis Salis et al. PyCG: Practical Call Graph Generation in Python. In 43rd International Conference on Software Engineering (ICSE), 25–28 May 2021.



