$E = MC^2$
print('Hello World!')
Hello World!
print("Hello World!")
Hello World!
ss = "Haa'ha"
print(ss)
Haa'ha
name = input("Please enter your name: ")
print(name)
Michael
! python --version
Python 3.9.4
# ! 命令可以在notebook中执行命令窗口的命令
! ls
Intro.key Python_intro.ipynb foooo.py Intro.pdf Python_intro2.html hello.py Python_intro.html __pycache__ my_package
如果要在这个环境中调用其他程序,比如R,可以用这个技巧
优势:
- 文字、代码、图表可统一在一个地方。
- 可方便转换为其他形式的文件,例如: pdf, html, markdown, slides 等
劣势:
- 内容比较多时容易杂乱。
- 解决方案:添加目录。这是我从jupyter notebook 转为 jupyter lab 的一个主要原因
如有自行编写的需经常重复使用的代码,应进行整理并构建自己的常用库(下面会介绍)
语法规则:markdown
HTML: Hypertext markup language
asdasdsa
italic
aasd
conda install <package_name>
conda install -c conda-forge <package_name>
conda-forge
是另一个管理package的channel,也在conda上。和默认的channel不一样。conda-forge
的包一般更新更快,有时也会发现在默认的channel里没有的包在conda-forge
中有package_name
写成,例如,package_name = 3
也可以用pip3
,类似的傻瓜式操作.
python中有一个virtualenv
可用来管理不同的python运作环境。好处:
在命令行中输入以下代码:
在anaconda3/envs/
下构建了一个myenv
环境
要切换到这个环境:
不用这个环境了:
如果出现有一个package只在pip中有,但我又想安装在conda的一个虚拟环境中,该怎么办? https://stackoverflow.com/questions/41060382/using-pip-to-install-packages-to-anaconda-environment/56889729#56889729
# This is a comment
a = 10
if a > 0:
print(a + 3)
else:
print(-a)
13
# This is ok, but not recommended.
a = 10
if a > 0:
print(a + 3)
else:
print(-a)
13
10000000
10000000
10_000_000
10000000
0xa5b3
42419
0.23
0.23
1e5
100000.0
1e-5
1e-05
"asdias"
'asdias'
'I have a \'naive\' dream.'
"I have a 'naive' dream."
print("I have a 'naive' dream.")
I have a 'naive' dream.
print('This is a cat. \n That is a dog.')
This is a cat. That is a dog.
print('This is a cat. \t That is a dog.')
This is a cat. That is a dog.
print("This is a cat. \ That is a dog.")
This is a cat. \ That is a dog.
print("This is a cat. \\ That is a dog.")
This is a cat. \ That is a dog.
print("This is a cat. \\\\\\ That is a dog.")
This is a cat. \\\ That is a dog.
如果要很多转义,就有很多的\\
。Python 还允许这样:
print(r'This is a cat. \t That is a dog.')
This is a cat. \t That is a dog.
r'' 中的字符不转义
还可以用 """ """, 通常用于函数和类的说明
def foo():
"""
This function is for ...
Arguments:
Return:
"""
pass
foo()
print("line1 \
line2 \
line3")
line1 line2 line3
print("""line1
line2
line3
""")
line1 line2 line3
字符串内部的字符可以循环
name
'Michael'
name[0:3]
'Mic'
for letter in name:
print(letter)
M i c h a e l
name = 'Martin Luther King'
'%s has a dream.' % name
'Martin Luther King has a dream.'
n = 1e6
'%s has %d supporters.' % (name, n)
'Martin Luther King has 1000000 supporters.'
%s
: 字符串%d
: 整数%f
: 浮点数%.2f
: 保留两位小数的浮点数'%s has %.2f yuan' % ('Michael', 1234.324256)
'Michael has 1234.32 yuan'
# format()
'Bingo, that {0} has {1} seats.'.format('classroom', 35)
'Bingo, that classroom has 35 seats.'
name = 'Lucy'
# f-string
f"{name} has a dream"
'Lucy has a dream'
什么是编码?
计算机只能处理数字,因此文本需转换为数字。最早的计算机只包含127个字符(大小写英文、数字、符号),这个编码表就是 ASCII。
中文的编码随后出现,名称是 GB2312。
如果编码没有指明,就会出现乱码。为了解决这个问题,出现了Unicode编码。这个编码把所有语言都统一到一套编码里。
最常用的Unicode编码是 UCS-16,用2个字节(byte, 1个byte是8个bit)表示一个字符。如果有偏僻的字符,就用4个字节。现代的操作系统和编程语言一般都支持Unicode。
但是,如果写的东西大部分是英文(比如代码),那么都用Unicode编码会占据大量的存储空间,存储和传输不方便。因此,在传输时,经常使用 UTF-8 编码,这种编码方式会根据字符的常用程度,进行不同长短的编码。
常用的英文字母是1个字节,汉字通常是3个字节,生僻的字符有4-6个字节。
字符 | ASCII | Unicode | UTF-8 |
---|---|---|---|
A | 01000001 | 00000000 01000001 | 01000001 |
中 | x | 01001110 00101101 | 11100100 10111000 10101101 |
因此,计算机一般这样工作:
Python 3 中字符串以 Unicode 编码。因此,中文可以显示。
print("中文")
中文
# ord()获取字符的整数,chr()把编码转换为对应的字符
ord('中')
20013
ord('A')
65
chr(1247)
'ӟ'
chr(1123)
'ѣ'
字符串前若有b,则这个字符串为bytes的格式
'ABC'
'ABC'
b'ABC' #括号内的被保存为bytes,也即可存储的机器可读的
b'ABC'
'中文'.encode('utf-8')
b'\xe4\xb8\xad\xe6\x96\x87'
'中文'.encode('ascii')
--------------------------------------------------------------------------- UnicodeEncodeError Traceback (most recent call last) Cell In[47], line 1 ----> 1 '中文'.encode('ascii') UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
'中文'.encode('GB2312')
# 若编码不对应,则会出错或者乱码
'中文'.encode('GB2312').decode('utf-8')
'ä'.encode('utf-8')
b'\xc3\xa4'
b'\xc3\xa4'.decode('utf-16')
'꓃'
b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8')
'中文'
'中文'.encode('GB2312').decode('GB2312')
'中文'
因此,在操作字符串时,时常会遇到str
和bytes
的转换,应当始终用UTF-8
编码进行。
另外,时常我们会看到一段这样的代码放在 .py
文件的头两行:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
这是两段特殊的注释。第一行告诉Linux/OS X 操作系统,这是可执行的程序。第二行告诉Python解释器,这段代码用UTF-8读取。
import pandas as pd
pd.read_csv('../../../data/AF_Co.txt',sep='\t')
Stkcd | Stknmec | Updt | Conme | Conmee | IndClaCd | Indus | Indnme | Listdt | Udwnm | Sponsor | Http | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ֤ȯ���� | ֤ȯ��� | �������� | ��˾�������� | ��˾Ӣ������ | ��ҵ����� | ��ҵ���루�£� | ��ҵ���� | �״��������� | �������� | �����Ƽ��� | ��˾���ʻ�����ַ |
1 | û�е�λ | û�е�λ | û�е�λ | û�е�λ | û�е�λ | û�е�λ | û�е�λ | û�е�λ | û�е�λ | û�е�λ | û�е�λ | û�е�λ |
2 | 000001 | ƽ������ | 2018-12-24 | ƽ�����йɷ�����˾ | Ping An Bank Co., Ltd. | 2 | J66 | ���ҽ��ڷ��� | 1991-04-03 | ��������֤ȯ��˾ | ���ھ�������֤ȯ��˾ | www.bank.pingan.com |
3 | 000002 | ���A | 2018-12-24 | �����ҵ�ɷ�����˾ | China Vanke Co., Ltd. | 2 | K70 | ���ز�ҵ | 1991-01-29 | ���ھ�������֤ȯ��˾ | ���ھ�������֤ȯ��˾ | www.vanke.com |
4 | 000003 | PT ����A | 2002-06-14 | ����ʵҵ(����)�ɷ�����˾ | Gintian Industry (Group) Co., Ltd. | 1 | M | �ۺ��� | 1991-07-03 | ��������֤ȯ��˾ | ���ھ�������֤ȯ��˾ | www.gintiangroup.com |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3801 | 900952 | ����B�� | 2018-12-24 | ���ݸ۹ɷ�����˾ | Jinzhou Port Co., Ltd. | 2 | G55 | ˮ������ҵ | 1998-05-19 | �㷢֤ȯ�������ι�˾ | �㷢֤ȯ�������ι�˾ | www.jinzhouport.com |
3802 | 900953 | ����B | 2018-12-24 | ���쿭���ɷ�����˾ | Kama Co., Ltd. | 2 | C36 | ��������ҵ | 1998-06-24 | ����֤ȯ����˾ | ����֤ȯ����˾ | www.kama.com.cn |
3803 | 900955 | ����B�� | 2018-12-26 | �������¹ɷ�����˾ | HNA INNOVATION CO.,LTD. | 2 | K70 | ���ز�ҵ | 1999-01-18 | �������֤ȯ�ɷ�����˾ | �������֤ȯ�ɷ�����˾ | www.ninedragon.com.cn,www.hnainnovation.com |
3804 | 900956 | ����B�� | 2018-12-24 | ��ʯ���������ɷ�����˾ | Huangshi Dongbei Electrical Appliance Co., Ltd. | 2 | C34 | ͨ���豸����ҵ | 1999-07-15 | ��֤ͨȯ�������ι�˾ | ��֤ͨȯ�������ι�˾ | www.donper.com |
3805 | 900957 | ����B�� | 2018-12-24 | �Ϻ�����ʵҵ��չ�ɷ�����˾ | Shanghai Lingyun Industries Development Co., Ltd. | 2 | K70 | ���ز�ҵ | 2000-07-28 | ��̩����֤ȯ�ɷ�����˾ | ��̩����֤ȯ�ɷ�����˾ | www.elingyun.com |
3806 rows × 12 columns
pd.read_csv('../../../data/AF_Co.txt',sep='\t',encoding='GB2312')
--------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) Cell In[54], line 1 ----> 1 pd.read_csv('../../../data/AF_Co.txt',sep='\t',encoding='GB2312') File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers.py:610, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 605 kwds_defaults = _refine_defaults_read( 606 dialect, delimiter, delim_whitespace, engine, sep, defaults={"delimiter": ","} 607 ) 608 kwds.update(kwds_defaults) --> 610 return _read(filepath_or_buffer, kwds) File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers.py:462, in _read(filepath_or_buffer, kwds) 459 _validate_names(kwds.get("names", None)) 461 # Create the parser. --> 462 parser = TextFileReader(filepath_or_buffer, **kwds) 464 if chunksize or iterator: 465 return parser File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers.py:819, in TextFileReader.__init__(self, f, engine, **kwds) 816 if "has_index_names" in kwds: 817 self.options["has_index_names"] = kwds["has_index_names"] --> 819 self._engine = self._make_engine(self.engine) File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers.py:1050, in TextFileReader._make_engine(self, engine) 1046 raise ValueError( 1047 f"Unknown engine: {engine} (valid options are {mapping.keys()})" 1048 ) 1049 # error: Too many arguments for "ParserBase" -> 1050 return mapping[engine](self.f, **self.options) File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers.py:1898, in CParserWrapper.__init__(self, src, **kwds) 1895 self.handles.handle = self.handles.handle.mmap # type: ignore[union-attr] 1897 try: -> 1898 self._reader = parsers.TextReader(self.handles.handle, **kwds) 1899 except Exception: 1900 self.handles.close() File pandas/_libs/parsers.pyx:518, in pandas._libs.parsers.TextReader.__cinit__() File pandas/_libs/parsers.pyx:620, in pandas._libs.parsers.TextReader._get_header() File pandas/_libs/parsers.pyx:814, in pandas._libs.parsers.TextReader._tokenize_rows() File pandas/_libs/parsers.pyx:1943, in pandas._libs.parsers.raise_parser_error() UnicodeDecodeError: 'gb2312' codec can't decode byte 0x95 in position 221129: illegal multibyte sequence
pd.read_csv('../../../data/AF_Co.txt',sep='\t',encoding='GBK')
Stkcd | Stknmec | Updt | Conme | Conmee | IndClaCd | Indus | Indnme | Listdt | Udwnm | Sponsor | Http | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 证券代码 | 证券简称 | 更新日期 | 公司中文名称 | 公司英文名称 | 行业分类标准 | 行业代码(新) | 行业名称 | 首次上市日期 | 主承销商 | 上市推荐人 | 公司国际互联网址 |
1 | 没有单位 | 没有单位 | 没有单位 | 没有单位 | 没有单位 | 没有单位 | 没有单位 | 没有单位 | 没有单位 | 没有单位 | 没有单位 | 没有单位 |
2 | 000001 | 平安银行 | 2018-12-24 | 平安银行股份有限公司 | Ping An Bank Co., Ltd. | 2 | J66 | 货币金融服务 | 1991-04-03 | 深圳特区证券公司 | 深圳经济特区证券公司 | www.bank.pingan.com |
3 | 000002 | 万科A | 2018-12-24 | 万科企业股份有限公司 | China Vanke Co., Ltd. | 2 | K70 | 房地产业 | 1991-01-29 | 深圳经济特区证券公司 | 深圳经济特区证券公司 | www.vanke.com |
4 | 000003 | PT 金田A | 2002-06-14 | 金田实业(集团)股份有限公司 | Gintian Industry (Group) Co., Ltd. | 1 | M | 综合类 | 1991-07-03 | 深圳特区证券公司 | 深圳经济特区证券公司 | www.gintiangroup.com |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3801 | 900952 | 锦港B股 | 2018-12-24 | 锦州港股份有限公司 | Jinzhou Port Co., Ltd. | 2 | G55 | 水上运输业 | 1998-05-19 | 广发证券有限责任公司 | 广发证券有限责任公司 | www.jinzhouport.com |
3802 | 900953 | 凯马B | 2018-12-24 | 恒天凯马股份有限公司 | Kama Co., Ltd. | 2 | C36 | 汽车制造业 | 1998-06-24 | 华夏证券有限公司 | 华夏证券有限公司 | www.kama.com.cn |
3803 | 900955 | 海创B股 | 2018-12-26 | 海航创新股份有限公司 | HNA INNOVATION CO.,LTD. | 2 | K70 | 房地产业 | 1999-01-18 | 申银万国证券股份有限公司 | 申银万国证券股份有限公司 | www.ninedragon.com.cn,www.hnainnovation.com |
3804 | 900956 | 东贝B股 | 2018-12-24 | 黄石东贝电器股份有限公司 | Huangshi Dongbei Electrical Appliance Co., Ltd. | 2 | C34 | 通用设备制造业 | 1999-07-15 | 国通证券有限责任公司 | 国通证券有限责任公司 | www.donper.com |
3805 | 900957 | 凌云B股 | 2018-12-24 | 上海凌云实业发展股份有限公司 | Shanghai Lingyun Industries Development Co., Ltd. | 2 | K70 | 房地产业 | 2000-07-28 | 国泰君安证券股份有限公司 | 国泰君安证券股份有限公司 | www.elingyun.com |
3806 rows × 12 columns
可以发现数据在生成时没有从GB编码转换为UTF-8,给用户带来了额外的不便。
True
True
False
False
True and True
True
True or False
True
False and True
False
5 > 2 and 1 > 3
False
not True
False
(5 > 2) & (3 > 2)
True
5 > 2 & 3 > 2
False
上式原因:& 的运算优先级比>高。2&3 的运算是 $$0000 0010\ \& \ 0000 0011 = 0000 0010 = 2$$. 因此,上式实际上等价于 $$5>2>2$$
因此,如果要进行两个或多个条件的“与或”判定,用 and, or,或者用括号
5 > 2 and 3 > 2
True
5 > 2 or 2 > 3
True
5 > 2 | 3 > 2 # why?
True
None
命名规则:大小写英文、下划线、数字的组合。不可以用数字开头。
a = 1
b = 'abs'
c = True
a = 'asd'
赋值时无需指明变量类型。这种语言被称为 动态语言 . 如果需要指明变量类型,则是 静态语言
例如,C++里面赋值是
int a = 34;
如果此时再 a = "ads"
,则会报错
a = 12
a = a + 13
a += 13
print(a)
38
另外,赋值操作的步骤在内存中实际上分两步。比如:
a = 'asd'
a = 'asd'
b = a
a = 'QWE'
这个操作是把 a 指向'asd',把 b 指向 a 所指向的数据。当 a 指向的数据变了之后, b 指向的数据并没有变化
print(a)
QWE
print(b)
asd
常量就是不变的量。命名规则:常量用大写字母。
START = 2000
PI = 3.14
另外,Python 2 和 3 在除法上有一个不同。
3 / 2
1.5
在 Python 2中,上面的除法会给出整数。3 / 2 的结果是 1
这种除法实际上是 floor。在 Python 3中,要得到floor,则
3 // 2
1
10 % 3
1
在 Python 2中,要得到浮点的解,则可以这样写:3 / 2.0。此时会给出结果 1.5
另外,Python的整数没有最大最小的范围,因此数字的大小仅受内存的影响。浮点数也没有大小限制,但超出一定范围会表示为 'inf'
students = ['Albert',[1,2,3],'Calvin']
students
['Albert', [1, 2, 3], 'Calvin']
students[0]
'Albert'
students.append(2)
students[-1::-1]
[2, 'Calvin', [1, 2, 3], 'Albert']
# slicing
students[0:3]
['Albert', [1, 2, 3], 'Calvin']
注意:list
中的 slicing 的终结点是数据实际标号的下一位。这个 slicing 和 pandas
中DataFrame
的.loc
slicing 不同,但和 .iloc
是一样的。
import pandas as pd
df = pd.read_pickle('../../../data/stk_df_2015_2021.pkl')
df
secID | secShortName | exchangeCD | tradeDate | preClosePrice | closePrice | turnoverVol | turnoverValue | dealAmount | turnoverRate | negMarketValue | marketValue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 000001.XSHE | 平安银行 | XSHE | 2015-01-05 | 1293.044 | 1307.737 | 4966040 | 4.565388e+09 | 92478.0 | 0.0291 | 1.575841e+11 | 1.830268e+11 |
1 | 000001.XSHE | 平安银行 | XSHE | 2015-01-06 | 1307.737 | 1288.146 | 3761152 | 3.453446e+09 | 80325.0 | 0.0220 | 1.552233e+11 | 1.802848e+11 |
2 | 000001.XSHE | 平安银行 | XSHE | 2015-01-07 | 1288.146 | 1263.656 | 2951601 | 2.634796e+09 | 72697.0 | 0.0173 | 1.522723e+11 | 1.768574e+11 |
3 | 000001.XSHE | 平安银行 | XSHE | 2015-01-08 | 1263.656 | 1221.208 | 2443951 | 2.128003e+09 | 68734.0 | 0.0143 | 1.471572e+11 | 1.709164e+11 |
4 | 000001.XSHE | 平安银行 | XSHE | 2015-01-09 | 1221.208 | 1231.004 | 4355039 | 3.835378e+09 | 99882.0 | 0.0255 | 1.483376e+11 | 1.722874e+11 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5838745 | 900957.XSHG | 凌云B股 | XSHG | 2021-12-27 | 0.625 | 0.638 | 488511 | 3.081540e+05 | 226.0 | 0.0027 | 1.164720e+08 | 2.209170e+08 |
5838746 | 900957.XSHG | 凌云B股 | XSHG | 2021-12-28 | 0.638 | 0.637 | 177702 | 1.118810e+05 | 64.0 | 0.0010 | 1.162880e+08 | 2.205680e+08 |
5838747 | 900957.XSHG | 凌云B股 | XSHG | 2021-12-29 | 0.637 | 0.630 | 123550 | 7.733300e+04 | 58.0 | 0.0007 | 1.150000e+08 | 2.181250e+08 |
5838748 | 900957.XSHG | 凌云B股 | XSHG | 2021-12-30 | 0.630 | 0.635 | 113600 | 7.130800e+04 | 41.0 | 0.0006 | 1.159200e+08 | 2.198700e+08 |
5838749 | 900957.XSHG | 凌云B股 | XSHG | 2021-12-31 | 0.635 | 0.636 | 167800 | 1.059960e+05 | 74.0 | 0.0009 | 1.161040e+08 | 2.202190e+08 |
5838750 rows × 12 columns
df2 = df.loc[0:3].copy()
df2.index = ['a','b','c','d']
df2.loc['a':'d']
secID | secShortName | exchangeCD | tradeDate | preClosePrice | closePrice | turnoverVol | turnoverValue | dealAmount | turnoverRate | negMarketValue | marketValue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
a | 000001.XSHE | 平安银行 | XSHE | 2015-01-05 | 1293.044 | 1307.737 | 4966040 | 4.565388e+09 | 92478.0 | 0.0291 | 1.575841e+11 | 1.830268e+11 |
b | 000001.XSHE | 平安银行 | XSHE | 2015-01-06 | 1307.737 | 1288.146 | 3761152 | 3.453446e+09 | 80325.0 | 0.0220 | 1.552233e+11 | 1.802848e+11 |
c | 000001.XSHE | 平安银行 | XSHE | 2015-01-07 | 1288.146 | 1263.656 | 2951601 | 2.634796e+09 | 72697.0 | 0.0173 | 1.522723e+11 | 1.768574e+11 |
d | 000001.XSHE | 平安银行 | XSHE | 2015-01-08 | 1263.656 | 1221.208 | 2443951 | 2.128003e+09 | 68734.0 | 0.0143 | 1.471572e+11 | 1.709164e+11 |
df.loc[0:3]
secID | secShortName | exchangeCD | tradeDate | preClosePrice | closePrice | turnoverVol | turnoverValue | dealAmount | turnoverRate | negMarketValue | marketValue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 000001.XSHE | 平安银行 | XSHE | 2015-01-05 | 1293.044 | 1307.737 | 4966040 | 4.565388e+09 | 92478.0 | 0.0291 | 1.575841e+11 | 1.830268e+11 |
1 | 000001.XSHE | 平安银行 | XSHE | 2015-01-06 | 1307.737 | 1288.146 | 3761152 | 3.453446e+09 | 80325.0 | 0.0220 | 1.552233e+11 | 1.802848e+11 |
2 | 000001.XSHE | 平安银行 | XSHE | 2015-01-07 | 1288.146 | 1263.656 | 2951601 | 2.634796e+09 | 72697.0 | 0.0173 | 1.522723e+11 | 1.768574e+11 |
3 | 000001.XSHE | 平安银行 | XSHE | 2015-01-08 | 1263.656 | 1221.208 | 2443951 | 2.128003e+09 | 68734.0 | 0.0143 | 1.471572e+11 | 1.709164e+11 |
df2.iloc[0:3]
secID | secShortName | exchangeCD | tradeDate | preClosePrice | closePrice | turnoverVol | turnoverValue | dealAmount | turnoverRate | negMarketValue | marketValue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
a | 000001.XSHE | 平安银行 | XSHE | 2015-01-05 | 1293.044 | 1307.737 | 4966040 | 4.565388e+09 | 92478.0 | 0.0291 | 1.575841e+11 | 1.830268e+11 |
b | 000001.XSHE | 平安银行 | XSHE | 2015-01-06 | 1307.737 | 1288.146 | 3761152 | 3.453446e+09 | 80325.0 | 0.0220 | 1.552233e+11 | 1.802848e+11 |
c | 000001.XSHE | 平安银行 | XSHE | 2015-01-07 | 1288.146 | 1263.656 | 2951601 | 2.634796e+09 | 72697.0 | 0.0173 | 1.522723e+11 | 1.768574e+11 |
del df
students[-1]
2
students.append('Steve')
students
['Albert', [1, 2, 3], 'Calvin', 2, 'Steve']
students.insert(1,'Hey')
students
['Albert', 'Hey', [1, 2, 3], 'Calvin', 2, 'Steve']
students.pop()
students
['Albert', 'Hey', [1, 2, 3], 'Calvin', 2]
students.pop(1)
students
['Albert', [1, 2, 3], 'Calvin', 2]
students
['Albert', [1, 2, 3], 'Calvin', 2]
students[2] = 'HeyHey'
students
['Albert', [1, 2, 3], 'HeyHey', 2]
list_ = ['a','b',[12,13]]
list_
['a', 'b', [12, 13]]
list_[2][1]
13
list_ = [1,2]
list2 = list_
list2
[1, 2]
# 第一个list内部修改赋值
list_[0] = 'xxx'
list_[1] = 'yyy'
list2
['xxx', 'yyy']
list_[0] = 'xxxx'
list_[1] = 'yyyy'
list2
['xxxx', 'yyyy']
list_[0] = 100
list_[1] = 1000
list2
[100, 1000]
# 第一个list外部直接赋值。此时list2和list_的联系也被切断
list_ = [2,3]
list2
[100, 1000]
list_[0] = 'asdas'
list_[1] = 'afaf'
list2
[100, 1000]
tuple 中的元素一旦确定,就不可以更改。
不能更改有什么用?数据更安全。
因此,可以用tuple的地方就尽量用tuple,以免不小心数据进行了不必要的改动
t = (1,2,3)
t[0]
1
t[1]
2
t[0:3]
(1, 2, 3)
t[0] = 123
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[125], line 1 ----> 1 t[0] = 123 TypeError: 'tuple' object does not support item assignment
t = (1)
t
1
这是因为括号()
也可以被理解为代数运算里的“括号”,产生了歧义。Python规定上面这种写法是代数运算里的括号,而非tuple. 若要写一个单元素的tuple,则
t = (1,)
t
(1,)
t = (1,2,'a')
t
(1, 2, 'a')
如果在tuple里面有可变的list,会怎么样?
list_ = ['a', 'b']
t = (1,2,list_)
print(t)
(1, 2, ['a', 'b'])
t[2][0] = 'x'
t[2][1] = 'y'
t
(1, 2, ['x', 'y'])
list_[0] = 'xx'
list_[1] = 'yy'
t
(1, 2, ['xx', 'yy'])
以上说明:
因此:
s = 'Hello'
list_ = ['a', 'b']
t = (s,2,list_)
print(t)
('Hello', 2, ['a', 'b'])
s = 'Hey'
print(t)
('Hello', 2, ['a', 'b'])
ss = s
t = (ss,2,list_)
print(t)
('Hey', 2, ['a', 'b'])
s = 'HeyHey'
print(t)
('Hey', 2, ['a', 'b'])
list_[0] = 1000
list_[1] = 1001
print(t)
('Hey', 2, [1000, 1001])
age = 20
if age >= 18:
print('your age is', age)
print('adult')
your age is 20 adult
age = 3
if age >= 18:
print('adult')
elif age >= 6:
print('teenager')
else:
print('kid')
kid
if 从上往下执行,只要发现了 True, 就忽略掉剩下的 elif, else。
age = 20
if age >= 6:
print('teenager')
elif age >= 18:
print('adult')
else:
print('kid')
teenager
if 378:
print('haha')
haha
只要 if 后面的不是0,不是空str, 不是空的list, 不是None,等,就判断为 True
len([])
0
if []:
print('haha')
else:
print('hehe')
hehe
for i in 'abc':
print(i)
a b c
list_ = ['Michael','Miller']
for i in list_:
print(i)
Michael Miller
n = 1
sum2 = 0
while n <= 100:
sum2 += n # same as sum = sum + n
n = n + 1
print(sum2)
5050
# break 跳出循环
n = 1
sum2 = 0
while n <= 100:
print(n)
sum2 += n # same as sum = sum + n
n = n + 1
if n == 10:
break
1 2 3 4 5 6 7 8 9
print(sum2)
45
# continue 跳出当前循环(continue后面的程序不执行了),继续下次循环
n = 0
while n <= 10:
n = n + 1
if n % 2 == 0:
continue
print(n)
1 3 5 7 9 11
# n = 0
# while n < 10:
# if n % 2 == 0:
# continue
# n = n + 1
enumerate(students)
<enumerate at 0x7fe3c6fb6500>
for i,j in enumerate(students):
print(i,j)
0 Albert 1 [1, 2, 3] 2 HeyHey 3 2
d = {'a':1,'b':2}
d['a']
1
d[0]
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[157], line 1 ----> 1 d[0] KeyError: 0
d['c'] = 'asd'
d
{'a': 1, 'b': 2, 'c': 'asd'}
d[0] = 'haha'
d
{'a': 1, 'b': 2, 'c': 'asd', 0: 'haha'}
d[0]
'haha'
'a' in d
True
d.keys()
dict_keys(['a', 'b', 'c', 0])
d.items()
dict_items([('a', 1), ('b', 2), ('c', 'asd'), (0, 'haha')])
d.items()[0]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[166], line 1 ----> 1 d.items()[0] TypeError: 'dict_items' object is not subscriptable
list(d.items())[0]
('a', 1)
d.values()
dict_values([1, 2, 'asd', 'haha'])
d.values()[0]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[169], line 1 ----> 1 d.values()[0] TypeError: 'dict_values' object is not subscriptable
list(d.values())[0]
1
dict 就像是有索引的电话本,查找数据很快。和list相比,dict是用空间换时间:
dict 的 keys 必须是不可变的。因此不可以是 list
{['a']:3}
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[171], line 1 ----> 1 {['a']:3} TypeError: unhashable type: 'list'
set也是一组key的集合(因此不可以重复),但不存储value
s = set([1,1,1,2,3,4,4])
s
{1, 2, 3, 4}
s.add(5)
s
{1, 2, 3, 4, 5}
s.remove(3)
s
{1, 2, 4, 5}
ss = set([4,5,6])
s = set([5,6,7])
s.intersection(ss)
{5, 6}
一个常见的错误(我自己就犯过好多次),是想要直接更改字符串的内容。但字符串是不可变的!
list_ = [3,1,2]
list_.sort()
list_
[1, 2, 3]
list 本身改变了
a = 'xyz'
a.replace('x','X')
'Xyz'
a
'xyz'
a 本身没有变
b = a.replace('x','X')
b
'Xyz'
replace 方法作用在 'xyz' 上,但没有改变 'xyz' 的内容,而是返回了一个新的字符串 'Xyz'。
abs(-1)
1
?abs
Signature: abs(x, /) Docstring: Return the absolute value of the argument. Type: builtin_function_or_method
abs?
Signature: abs(x, /) Docstring: Return the absolute value of the argument. Type: builtin_function_or_method
import pandas as pd
?pd.read_csv
Signature: pd.read_csv( filepath_or_buffer: Union[ForwardRef('PathLike[str]'), str, IO[~T], io.RawIOBase, io.BufferedIOBase, io.TextIOBase, _io.TextIOWrapper, mmap.mmap], sep=<object object at 0x7fe41311c480>, delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal: str = '.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None, storage_options: Optional[Dict[str, Any]] = None, ) Docstring: Read a comma-separated values (csv) file into DataFrame. Also supports optionally iterating or breaking of the file into chunks. Additional help can be found in the online docs for `IO Tools <https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html>`_. Parameters ---------- filepath_or_buffer : str, path object or file-like object Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv. If you want to pass in a path object, pandas accepts any ``os.PathLike``. By file-like object, we refer to objects with a ``read()`` method, such as a file handle (e.g. via builtin ``open`` function) or ``StringIO``. sep : str, default ',' Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python's builtin sniffer tool, ``csv.Sniffer``. In addition, separators longer than 1 character and different from ``'\s+'`` will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``. delimiter : str, default ``None`` Alias for sep. header : int, list of int, default 'infer' Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to ``header=0`` and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to ``header=None``. Explicitly pass ``header=0`` to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if ``skip_blank_lines=True``, so ``header=0`` denotes the first line of data rather than the first line of the file. names : array-like, optional List of column names to use. If the file contains a header row, then you should explicitly pass ``header=0`` to override the column names. Duplicates in this list are not allowed. index_col : int, str, sequence of int / str, or False, default ``None`` Column(s) to use as the row labels of the ``DataFrame``, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used. Note: ``index_col=False`` can be used to force pandas to *not* use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line. usecols : list-like or callable, optional Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in `names` or inferred from the document header row(s). For example, a valid list-like `usecols` parameter would be ``[0, 1, 2]`` or ``['foo', 'bar', 'baz']``. Element order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``. To instantiate a DataFrame from ``data`` with element order preserved use ``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` for columns in ``['foo', 'bar']`` order or ``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]`` for ``['bar', 'foo']`` order. If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be ``lambda x: x.upper() in ['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster parsing time and lower memory usage. squeeze : bool, default False If the parsed data only contains one column then return a Series. prefix : str, optional Prefix to add to column numbers when no header, e.g. 'X' for X0, X1, ... mangle_dupe_cols : bool, default True Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather than 'X'...'X'. Passing in False will cause data to be overwritten if there are duplicate names in the columns. dtype : Type name or dict of column -> type, optional Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32, 'c': 'Int64'} Use `str` or `object` together with suitable `na_values` settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion. engine : {'c', 'python'}, optional Parser engine to use. The C engine is faster while the python engine is currently more feature-complete. converters : dict, optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels. true_values : list, optional Values to consider as True. false_values : list, optional Values to consider as False. skipinitialspace : bool, default False Skip spaces after delimiter. skiprows : list-like, int or callable, optional Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be ``lambda x: x in [0, 2]``. skipfooter : int, default 0 Number of lines at bottom of file to skip (Unsupported with engine='c'). nrows : int, optional Number of rows of file to read. Useful for reading pieces of large files. na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'. keep_default_na : bool, default True Whether or not to include the default NaN values when parsing the data. Depending on whether `na_values` is passed in, the behavior is as follows: * If `keep_default_na` is True, and `na_values` are specified, `na_values` is appended to the default NaN values used for parsing. * If `keep_default_na` is True, and `na_values` are not specified, only the default NaN values are used for parsing. * If `keep_default_na` is False, and `na_values` are specified, only the NaN values specified `na_values` are used for parsing. * If `keep_default_na` is False, and `na_values` are not specified, no strings will be parsed as NaN. Note that if `na_filter` is passed in as False, the `keep_default_na` and `na_values` parameters will be ignored. na_filter : bool, default True Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file. verbose : bool, default False Indicate number of NA values placed in non-numeric columns. skip_blank_lines : bool, default True If True, skip over blank lines rather than interpreting as NaN values. parse_dates : bool or list of int or names or list of lists or dict, default False The behavior is as follows: * boolean. If True -> try parsing the index. * list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. * list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. * dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo' If a column or index cannot be represented as an array of datetimes, say because of an unparsable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use ``pd.to_datetime`` after ``pd.read_csv``. To parse an index or column with a mixture of timezones, specify ``date_parser`` to be a partially-applied :func:`pandas.to_datetime` with ``utc=True``. See :ref:`io.csv.mixed_timezones` for more. Note: A fast-path exists for iso8601-formatted dates. infer_datetime_format : bool, default False If True and `parse_dates` is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x. keep_date_col : bool, default False If True and `parse_dates` specifies combining multiple columns then keep the original columns. date_parser : function, optional Function to use for converting a sequence of string columns to an array of datetime instances. The default uses ``dateutil.parser.parser`` to do the conversion. Pandas will try to call `date_parser` in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays (as defined by `parse_dates`) as arguments; 2) concatenate (row-wise) the string values from the columns defined by `parse_dates` into a single array and pass that; and 3) call `date_parser` once for each row using one or more strings (corresponding to the columns defined by `parse_dates`) as arguments. dayfirst : bool, default False DD/MM format dates, international and European format. cache_dates : bool, default True If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets. .. versionadded:: 0.25.0 iterator : bool, default False Return TextFileReader object for iteration or getting chunks with ``get_chunk()``. .. versionchanged:: 1.2 ``TextFileReader`` is a context manager. chunksize : int, optional Return TextFileReader object for iteration. See the `IO Tools docs <https://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking>`_ for more information on ``iterator`` and ``chunksize``. .. versionchanged:: 1.2 ``TextFileReader`` is a context manager. compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer' For on-the-fly decompression of on-disk data. If 'infer' and `filepath_or_buffer` is path-like, then detect compression from the following extensions: '.gz', '.bz2', '.zip', or '.xz' (otherwise no decompression). If using 'zip', the ZIP file must contain only one data file to be read in. Set to None for no decompression. thousands : str, optional Thousands separator. decimal : str, default '.' Character to recognize as decimal point (e.g. use ',' for European data). lineterminator : str (length 1), optional Character to break file into lines. Only valid with C parser. quotechar : str (length 1), optional The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored. quoting : int or csv.QUOTE_* instance, default 0 Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3). doublequote : bool, default ``True`` When quotechar is specified and quoting is not ``QUOTE_NONE``, indicate whether or not to interpret two consecutive quotechar elements INSIDE a field as a single ``quotechar`` element. escapechar : str (length 1), optional One-character string used to escape other characters. comment : str, optional Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as ``skip_blank_lines=True``), fully commented lines are ignored by the parameter `header` but not by `skiprows`. For example, if ``comment='#'``, parsing ``#empty\na,b,c\n1,2,3`` with ``header=0`` will result in 'a,b,c' being treated as the header. encoding : str, optional Encoding to use for UTF when reading/writing (ex. 'utf-8'). `List of Python standard encodings <https://docs.python.org/3/library/codecs.html#standard-encodings>`_ . .. versionchanged:: 1.2 When ``encoding`` is ``None``, ``errors="replace"`` is passed to ``open()``. Otherwise, ``errors="strict"`` is passed to ``open()``. This behavior was previously only the case for ``engine="python"``. dialect : str or csv.Dialect, optional If provided, this parameter will override values (default or not) for the following parameters: `delimiter`, `doublequote`, `escapechar`, `skipinitialspace`, `quotechar`, and `quoting`. If it is necessary to override values, a ParserWarning will be issued. See csv.Dialect documentation for more details. error_bad_lines : bool, default True Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these "bad lines" will dropped from the DataFrame that is returned. warn_bad_lines : bool, default True If error_bad_lines is False, and warn_bad_lines is True, a warning for each "bad line" will be output. delim_whitespace : bool, default False Specifies whether or not whitespace (e.g. ``' '`` or ``' '``) will be used as the sep. Equivalent to setting ``sep='\s+'``. If this option is set to True, nothing should be passed in for the ``delimiter`` parameter. low_memory : bool, default True Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set False, or specify the type with the `dtype` parameter. Note that the entire file is read into a single DataFrame regardless, use the `chunksize` or `iterator` parameter to return the data in chunks. (Only valid with C parser). memory_map : bool, default False If a filepath is provided for `filepath_or_buffer`, map the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead. float_precision : str, optional Specifies which converter the C engine should use for floating-point values. The options are ``None`` or 'high' for the ordinary converter, 'legacy' for the original lower precision pandas converter, and 'round_trip' for the round-trip converter. .. versionchanged:: 1.2 storage_options : dict, optional Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by ``fsspec``, e.g., starting "s3://", "gcs://". An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values. .. versionadded:: 1.2 Returns ------- DataFrame or TextParser A comma-separated values (csv) file is returned as two-dimensional data structure with labeled axes. See Also -------- DataFrame.to_csv : Write DataFrame to a comma-separated values (csv) file. read_csv : Read a comma-separated values (csv) file into DataFrame. read_fwf : Read a table of fixed-width formatted lines into DataFrame. Examples -------- >>> pd.read_csv('data.csv') # doctest: +SKIP File: ~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/parsers.py Type: function
abs(1,2,3)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[191], line 1 ----> 1 abs(1,2,3) TypeError: abs() takes exactly one argument (3 given)
def foo(n):
if n < 10:
return n
else:
return -n
foo(1)
1
foo(12)
-12
from foooo import my_foo
my_foo(12)
-12
pass
¶pass 可以什么都不做,经常用作占位。比如这里应该实现一个什么功能,现在已经想好了,但没来得及编写。为了避免这部分出错,就用pass
def bar():
"""
Description: This function ....
"""
pass
bar()
def foo(n):
if n < 10:
return n
else:
return -n, -n+1
foo(21)
(-21, -20)
注意这里返回的是一个 tuple。 可以这样赋值:
a, b = foo(21)
a
-21
b
-20
如果没有 return
, 则函数返回 None
def foo(n):
if n > 10:
print(n)
a = foo(20)
20
a
def my_power(x):
return x*x
# 按照位置传入参数
my_power(2)
4
# 按照名称传入参数
my_power(x=3)
9
my_power(y=3)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[210], line 1 ----> 1 my_power(y=3) TypeError: my_power() got an unexpected keyword argument 'y'
若要计算3次方,可以另写一个。但更好的方式是增加一个参数,可以随意输入想要计算的指数。
def my_power(x, n):
return x**n
my_power(2, 3)
8
my_power(2)
# 此时原有的代码失效了,参数个数不对。
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[213], line 1 ----> 1 my_power(2) TypeError: my_power() missing 1 required positional argument: 'n'
更好的方法是设置一个默认的参数值。
def my_power(x,n=2):
return x**n
my_power(2)
4
注意:
def enroll(name, gender, city='Tianjin', nation='China'):
print('name: ', name)
print('gender: ', gender)
print('city: ', city)
print('nation: ', nation)
enroll('Michael','Male')
name: Michael gender: Male city: Tianjin nation: China
enroll('Michell', 'Female', 'Beijing')
name: Michell gender: Female city: Beijing nation: China
默认参数有个坑:
def my_append(lst=[]):
lst.append('asd')
return lst
my_append([1,2,3])
[1, 2, 3, 'asd']
my_append(['a','b','c'])
['a', 'b', 'c', 'asd']
my_append()
['asd']
my_append()
['asd', 'asd']
my_append()
['asd', 'asd', 'asd']
函数在定义的时候,默认参数lst
的值就指向了[]
。因为lst
指向的是一个变量[]
,每次调用时如果lst
的值被改变,下一次就会指向这个改变了的变量。
def my_append(lst=None):
if lst == None:
lst = []
lst.append('asd')
return lst
my_append()
['asd']
my_append()
['asd']
my_append([1,2,3])
[1, 2, 3, 'asd']
为了避开这样的坑,编程时:
def my_append(lst=()):
lst = lst+('asd',)
return lst
my_append()
('asd',)
my_append()
('asd',)
如果输入参数的数目不确定,可以怎么定义?
def my_sum(lst):
sum = 0
for n in lst:
sum = sum + n
return sum
my_sum([1,2,3,4,5,])
15
args: arguments
def my_sum(*args): # 参数args接收到了一个tuple
sum = 0
for n in args:
sum = sum + n
return sum
my_sum(1,2,3)
6
my_sum(1,2,3,4,5,6)
21
my_sum()
0
# 如果已经有一个list,要传入一个可变参数的函数
a = [1,2,3]
my_sum(a[0],a[1],a[2])
6
my_sum(*a)
6
kwargs: key word arguments
def person(name, age, **kwargs):
print('name: ', name, 'age: ', age, 'other: ', kwargs)
person('asd',13, city='Tianjin', nation='China', adasd='asdsad')
name: asd age: 13 other: {'city': 'Tianjin', 'nation': 'China', 'adasd': 'asdsad'}
date_info = {'year': "2021", 'month': '01', 'day': '03'}
author_info = {'author': 'Ronald Coase', 'article': 'The Nature of Firms'}
filename = '{year}-{month}-{day}: {author}, {article}'.format(**date_info,**author_info)
filename
'2021-01-03: Ronald Coase, The Nature of Firms'
如果要限制传入的关键字参数的名称, 则使用分隔符 *
(命名关键字参数)
def person(name, age, *, city, nation): # 只接受city, nation作为关键字参数
print('name: ', name, 'age: ', age,
'city: ', city, 'nation: ', nation)
person('michael',16, city='Tianjin', nation='China')
name: michael age: 16 city: Tianjin nation: China
person('michael',16, city='Tianjin', Nation='China')
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[246], line 1 ----> 1 person('michael',16, city='Tianjin', Nation='China') TypeError: person() got an unexpected keyword argument 'Nation'
若已有一个可变参数,则后面的参数不需要分隔符*
。同时,后面的参数赋值时必须要有名称,
不可以是位置
def person(name, age, *args, city, nation):
print('name: ', name, 'age: ', age,
*args,
'city: ', city, 'nation: ', nation)
person('Jack', 15, 'Earth', 'Sun','universe', city='Beijing', nation='China',asdasd='asdasda')
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[248], line 1 ----> 1 person('Jack', 15, 'Earth', 'Sun','universe', city='Beijing', nation='China',asdasd='asdasda') TypeError: person() got an unexpected keyword argument 'asdasd'
person('Jack', 15, 'Earth', 'Sun','Beijing','China')
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[249], line 1 ----> 1 person('Jack', 15, 'Earth', 'Sun','Beijing','China') TypeError: person() missing 2 required keyword-only arguments: 'city' and 'nation'
person('Jack', 15, 'Earth', 'Sun','Beijing','China',city='Beijing', nation='China')
name: Jack age: 15 Earth Sun Beijing China city: Beijing nation: China
def foo(*, city, nation):
print(city)
print(nation)
foo('Beijing','China')
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[252], line 1 ----> 1 foo('Beijing','China') TypeError: foo() takes 0 positional arguments but 2 were given
foo(city='Beijing',nation='China')
Beijing China
def foo(city, nation):
print(city)
print(nation)
foo('Beijing','China')
Beijing China
在Python中定义函数,可以用必选参数、默认参数、可变参数、关键字参数和命名关键字参数,这5种参数都可以组合使用。但是请注意,参数定义的顺序必须是:必选参数、默认参数、可变参数、命名关键字参数和关键字参数。
def f1(a, b, c=0, *args, **kw):
pass
在函数定义时调用函数本身,就是递归函数
def fact(n):
if n == 1:
return 1
return n * fact(n-1)
fact(1)
1
fact(5)
120
iterable
iterator
iter()
作用于任何一个 iterable
来获得next()
作用于 iterator 会得到下一个元素Python 中的迭代很方便,list, tuple, dict, string...很多都可以直接进入 for ... in
for letter in 'asbdasdas':
print(letter)
a s b d a s d a s
d = {'a':1, 'b':123, 'c':'asd'}
for i in d:
print(i)
a b c
for value in d.values():
print(value)
1 123 asd
d.items()
dict_items([('a', 1), ('b', 123), ('c', 'asd')])
for key, value in d.items():
print(key, value)
a 1 b 123 c asd
list_ = [1,2,3,4,1,23,]
for e in list_:
print(e)
1 2 3 4 1 23
enumerate
也很常用
list(enumerate(list_))
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 1), (5, 23)]
for i, value in enumerate(list_):
print(i, value)
0 1 1 2 2 3 3 4 4 1 5 23
l = iter(list_)
next(l)
1
next(l)
2
for e in l:
print(e)
3 4 1 23
注意:上面的l已经被next()
取用过2次了,因此for e in l:
时从3开始
next(l)
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) Cell In[274], line 1 ----> 1 next(l) StopIteration:
l = iter(list_)
iter(l) # iter(iterator) 返回自身
<list_iterator at 0x7fe4123f7eb0>
l
<list_iterator at 0x7fe4123f7eb0>
l = []
for x in range(10):
l.append(x**2)
l
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[x**2 for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[x**2 for x in range(10) if (x%2==0) and (x != 6)]
[0, 4, 16, 64]
熟练掌握这种语法以后,会在很多时候发现语法变得简单,例如:
ss = ['a','b','c']
[0,1,2]
[0, 1, 2]
'a'+'b'
'ab'
[s+f'_{n}' for s in ss for n in range(3)]
['a_0', 'a_1', 'a_2', 'b_0', 'b_1', 'b_2', 'c_0', 'c_1', 'c_2']
# 等价于
l_ = []
for s in ss:
for n in range(3):
l_.append(s+f'_{n}')
l_
['a_0', 'a_1', 'a_2', 'b_0', 'b_1', 'b_2', 'c_0', 'c_1', 'c_2']
[s+f'_{n+1}' for n,s in enumerate(ss)]
['a_1', 'b_2', 'c_3']
zip
函数¶ss
['a', 'b', 'c']
?zip
Init signature: zip(self, /, *args, **kwargs) Docstring: zip(*iterables) --> A zip object yielding tuples until an input is exhausted. >>> list(zip('abcdefg', range(3), range(4))) [('a', 0, 0), ('b', 1, 1), ('c', 2, 2)] The zip object yields n-length tuples, where n is the number of iterables passed as positional arguments to zip(). The i-th element in every tuple comes from the i-th iterable argument to zip(). This continues until the shortest argument is exhausted. Type: type Subclasses:
zip(ss,range(3))
<zip at 0x7fe3baf824c0>
list(zip(ss,range(3)))
[('a', 0), ('b', 1), ('c', 2)]
list(zip(('a', 0), ('b', 1), ('c', 2)))
[('a', 'b', 'c'), (0, 1, 2)]
list(zip(*zip(ss,range(3))))
[('a', 'b', 'c'), (0, 1, 2)]
上述实现了zip的反向操作。*的作为函数的参数,作用是把list或tuple中的元素拆开。所以zip(ss,range(3))
的结果[('a', 0), ('b', 1), ('c', 2)]
被拆解成了('a', 0), ('b', 1), ('c', 2)
generator
的作用:不一次过把所有数据都生产出来,根据算法,随用随生成。
(x for x in range(10))
<generator object <genexpr> at 0x7fe3bafff970>
g = (x for x in range(10))
g[0]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[296], line 1 ----> 1 g[0] TypeError: 'generator' object is not subscriptable
next(g)
0
next(g)
1
next(g)
2
for x in g:
print(x)
3 4 5 6 7 8 9
很多时候,生成序列的算法是很清晰的,但想要全部一次都生成却有困难。例如Fibonacci数列。
用 yield
来制作一个 generator
def fib(max_num):
n, a, b = 0, 0, 1
while n < max_num:
print(b)
a, b = b, a+b # 等价于 a, b = (b, a+b)
# 或者 t = (b, a+b)
# a = t[0]
# b = t[1]
n = n + 1
return 'done'
fib(10)
1 1 2 3 5 8 13 21 34 55
'done'
改成 generator:
def fib(max_num):
n, a, b = 0, 0, 1
while n < max_num:
yield b
a, b = b, a+b
n = n + 1
return 'done'
f = fib(10)
next(f)
1
next(f)
1
for num in f:
print(num)
2 3 5 8 13 21 34 55
和函数不一样的地方:
return
或者最后一行,把值输出next
时执行,遇到 yield
就输出并且暂停;下一次再调用next
时,从上一次暂停的地方继续往下range
函数也是lazy iterable,但它本身不是一个 iterator
range(10)
range(0, 10)
next(range(10))
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[309], line 1 ----> 1 next(range(10)) TypeError: 'range' object is not an iterator
abs(-2)
2
abs
<function abs(x, /)>
display(abs)
<function abs(x, /)>
print(abs)
<built-in function abs>
foo = abs
foo(-2)
2
abs = 'abs' # 可以但最好不要这么做
abs(2)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[317], line 1 ----> 1 abs(2) TypeError: 'str' object is not callable
del abs
abs(2)
2
函数作为参数
def foo(x1,x2,func):
return func(x1) + func(x2)
foo(1,2,abs)
3
foo(-1,4,abs)
5
import numpy as np
foo(123,14,np.log)
7.451241684987675
map有两个参数:
map(abs, [1,2,-3,-5])
<map at 0x7fe3baf49370>
m = map(abs, [1,2,-3,-5])
list(m)
[1, 2, 3, 5]
np.abs([1,2,-3,-5])
array([1, 2, 3, 5])
reduce 也把一个func作用在一个序列上。
reduce(f,[1,2,3,4]) = f(f(f(1,2),3),4)
from functools import reduce
def foo(x,y):
return x + y
reduce(foo,[1,2,3,4,5])
15
filter也接收一个func和一个序列。
def is_odd(x):
return x % 2 == 1
filter(is_odd, [1,2,3,4,5,6,7,9])
<filter at 0x7fe3bb022850>
list(filter(is_odd, [1,2,3,4,5,6,7,9]))
[1, 3, 5, 7, 9]
lambda 函数无需给出函数名称。可用于一些较复杂操作时,需要一个自定义的简单函数作为中间步骤的情况。例如,做数据操作时,可能需要先分组然后进行一个四则运算,此时可用lambda定义这个四则运算。
lambda x: x**3
<function __main__.<lambda>(x)>
f = lambda x: x**3
f(2)
8
f = lambda x,y: x*10-y
f(3,1)
29
d = [('a', 100), ('b', 23), ('c', 1890)]
?sorted
Signature: sorted(iterable, /, *, key=None, reverse=False) Docstring: Return a new list containing all items from the iterable in ascending order. A custom key function can be supplied to customize the sort order, and the reverse flag can be set to request the result in descending order. Type: builtin_function_or_method
sorted(d, key=lambda x: x[1]) #按照dict里面的value进行排序
[('b', 23), ('a', 100), ('c', 1890)]
lambda
实际可看作是最小单元的函数。复杂的函数可以分解成一个一个的lambda
my_add = lambda x: lambda y: x+y
my_add(4)(5)
9
用途:有的时候我们需要改变一些已有的函数,比如添加一些功能,但又不想修改这个函数本身,此时可以给这个函数加一点“装饰”
def log(func):
def wrapper(*args, **kw):
print('call %s():' % func.__name__)
return func(*args, **kw)
return wrapper
log(print)
<function __main__.log.<locals>.wrapper(*args, **kw)>
wrapper 👆
wrapper()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[345], line 1 ----> 1 wrapper() NameError: name 'wrapper' is not defined
log(print)('hello')
call print(): hello
def cat_miao():
print('miao miao')
log(cat_miao)()
call cat_miao(): miao miao
@log
def cat_miao():
print('miao miao')
cat_miao()
call cat_miao(): miao miao
@log 放在 cat_miao()的定义处,相当于:
cat_miao = log(cat_miao)
此时的cat_miao
已被改变为用log装饰过的新的cat_miao
def print_func_name(func):
def wrap_1():
print("Now use function '{}'".format(func.__name__))
func()
return wrap_1
def print_time(func):
import time
def wrap_2():
print("Now the time is {}".format(int(time.time())))
func()
return wrap_2
@print_func_name
@print_time
def dog_bark():
print("Bark !!!")
dog_bark()
Now use function 'wrap_2' Now the time is 1676208997 Bark !!!
dog_bark -->
`dog_bark` with `print time` (wrap_2) -->
`print func name` with (dog_bark` with `print time`)
def log(text):
def decorator(func):
def wrapper(*args, **kwargs):
print('{} {}(): '.format(text, func.__name__))
return func(*args, **kwargs)
return wrapper
return decorator
@log('execute')
def cat_miao():
print('miao miao')
cat_miao()
execute cat_miao(): miao miao
有一个问题:cat_miao
的名称在decorator中被改变了
cat_miao.__name__
'wrapper'
要在wrapper中保留func的原名,可以这样写:
import functools
def log(text):
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
print('{} {}(): '.format(text, func.__name__))
return func(*args, **kwargs)
return wrapper
return decorator
@log('execute')
def cat_miao():
print('miao miao')
cat_miao.__name__
'cat_miao'
def log(text):
def decorator(func):
def wrapper(*args, **kwargs):
print('{} {}(): '.format(text, func.__name__))
return func(*args, **kwargs)
return wrapper
return decorator
text
,这种结构叫做 closure (闭包)。闭包:
def my_height():
height = 10
def print_height():
print("The height is: %d" % height)
return print_height
h = my_height()
h
<function __main__.my_height.<locals>.print_height()>
h()
The height is: 10
闭包的设计在很多语言中都有,在程序设计中比较重要。但我们在数据操作时一般不太常用到。
有时我们希望把已有的函数拿过来用,但同时觉得原来函数的参数过多,有些参数想直接设好默认值,定义为一个新的函数
int('12314')
12314
int('0101010101111', base=2)
2735
import functools
int2 = functools.partial(int, base=2)
int2('100001101001')
2153
int2('100001101001', base=8)
8590230017
.py
文件就是一个 module相当于把模块又封装了一层。这样模块的名称就变成了某个包下的模块,my_package.xyz
__init__.py
告诉python这个是一个package,可以为空,也可以有代码。
模块下也可以有多级的结构
import my_package
my_package.xyz.my_print()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[371], line 1 ----> 1 my_package.xyz.my_print() AttributeError: module 'my_package' has no attribute 'xyz'
原因是 sub-modules 不会自动被导入,除非明确要求导入了(在init.py中指明)
import my_package.xyz as xyz
xyz.my_print()
Roses are red, Violets are blue, Whatever you say, is always true. asdasd
如果要自动在导入 my_package 时就导入 xyz,可以在 __init__.py
中加入:
import my_package.xyz
注意到这个语句:
if __name__=='__main__':
my_print()
作用是:
.py
文件时,python解释器把特殊变量__name__
设为__main__
。此时if
下的语句就会被运行。.py
文件在其他地方被导入时,if
判断为False,里面的语句就不会运行。如果对module做了改动,结果不会发生变化:
xyz.my_print()
Roses are red, Violets are blue, Whatever you say, is always true. asdasd
如果要重新导入:
from importlib import reload
reload(my_package)
<module 'my_package' from '/Users/Bert/work/03-Learning/teaching/quant/lec_notes/lec1/2023/my_package/__init__.py'>
my_package.xyz.my_print()
Roses are red, Violets are blue, Whatever you say, is always true. asdasd
reload(my_package.xyz)
<module 'my_package.xyz' from '/Users/Bert/work/03-Learning/teaching/quant/lec_notes/lec1/2023/my_package/xyz.py'>
xyz.my_print()
Roses are red, Violets are blue, Whatever you say, is always true. asdasd
__xxx__
是特殊变量,可以被引用,但通常有特别的含义。_xxx
或__xxx
是非公开的,不应该被外面直接引用def _print1(x):
return print(f'{x}')
def _print2(x):
return print(f'{x} + {x}')
def hello(num):
if num > 3:
return _print1(num)
else:
return _print2(num)
把内部的逻辑隐藏起来,这样就进行了一种有益的抽象和封装
面向过程的程序设计:
面向对象:
class Student(object):
pass
# class Student():
# pass
# class Student:
# pass
object表示Student是从object这个类里面“继承”而来的(也即自动拥有了object的一些性质或功能)
print(Student)
<class '__main__.Student'>
Student
__main__.Student
print(Student)
<class '__main__.Student'>
xiaoming = Student()
xiaoming
<__main__.Student at 0x7fe3bb08a3a0>
xiaoming.name = 'xiaoming'
xiaoming.sex = 'male'
xiaoming.name
'xiaoming'
xiaoming.sex
'male'
类是一个模版,所以,创建时,可以把一些必须有的属性放进去
class Student(object):
class_attr = 'class 8'
def __init__(self, name, score):
self.name = name
self.score = score
__init__
的第一个参数一定是self
,表示创建的实例self
在内部创建新的属性,就绑定到实例本身上去__init__
括号中剩下的参数表示创建实例时就要输入的class_attr
是“类”的 attribute,由所有的 instance 共享。注意:不要把类属性和实例属性用同样的名称命名。__init__
里面定义的 attribute是 instance的 attribute,每个instance有自己的数据xm = Student()
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[392], line 1 ----> 1 xm = Student() TypeError: __init__() missing 2 required positional arguments: 'name' and 'score'
xm = Student(name='xiaoming',score=70)
xm.name
'xiaoming'
xm.score
70
xm.class_attr
'class 8'
xf = Student(name='xiaofang',score=90)
xf.class_attr
'class 8'
xm.name = 'xiaomingming'
xm.name
'xiaomingming'
可以从外部访问实例的数据:
print(xm.score)
70
但是,score
本身存在于xm内部,没有必要从外面访问。另外,除非知道score
的定义方式,用外面的函数来访问,可能出错。对于一般的使用者来讲,没有必要搞清楚实例内部的数据的完整定义,只需了解如何去取用数据就好。这种封装类似于一个黑匣子。使用者只需要知道:
class Student(object):
def __init__(self, name, score):
self.name = name
self.score = score
def print_score(self):
print('%s: %s' % (self.name, self.score))
xm = Student('xiaoming',70)
xm.print_score()
xiaoming: 70
可以给类增加很多方法
class Student(object):
def __init__(self, name, score):
self.name = name
self.score = score
def print_score(self):
print('%s: %s' % (self.name, self.score))
def get_grade(self):
if self.score >= 90:
return 'A'
elif self.score >= 60:
return 'B'
else:
return 'C'
xm = Student('xiaoming',70)
xm.get_grade()
'B'
为了避免外部函数访问内部的数据,可以进行限制。
class Student(object):
def __init__(self, name, score):
self.__name = name
self.__score = score
def print_score(self):
print('%s: %s' % (self.__name, self.__score))
def get_grade(self):
if self.__score >= 90:
return 'A'
elif self.__score >= 60:
return 'B'
else:
return 'C'
xm = Student('xiaoming', 80)
xm.__name
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[410], line 1 ----> 1 xm.__name AttributeError: 'Student' object has no attribute '__name'
xm.get_grade()
'B'
class Student(object):
def __init__(self, name, score):
self.__name = name
self.__score = score
def print_score(self):
print('%s: %s' % (self.__name, self.__score))
def get_grade(self):
if self.__score >= 90:
return 'A'
elif self.__score >= 60:
return 'B'
else:
return 'C'
def get_name(self):
return self.__name
def get_score(self):
return self.__score
xm = Student('xiaoming',70)
xm.__name
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[414], line 1 ----> 1 xm.__name AttributeError: 'Student' object has no attribute '__name'
xm.get_name()
'xiaoming'
此时虽然可以拿到name和score,但没法从外面改动了。如果希望score可以改动,则再定义一个方法
xm.__name = 'Shoahsd'
xm.get_name()
'xiaoming'
class Student(object):
def __init__(self, name, score):
self.__name = name
self.__score = score
def print_score(self):
print('%s: %s' % (self.__name, self.__score))
def get_grade(self):
if self.__score >= 90:
return 'A'
elif self.__score >= 60:
return 'B'
else:
return 'C'
def get_name(self):
return self.__name
def get_score(self):
return self.__score
def set_score(self, score):
self.__score = score
在内部定义数据的方法有两个好处:
class Student(object):
def __init__(self, name, score):
self.__name = name
self.__score = score
def print_score(self):
print('%s: %s' % (self.__name, self.__score))
def get_grade(self):
if self.__score >= 90:
return 'A'
elif self.__score >= 60:
return 'B'
else:
return 'C'
def get_name(self):
return self.__name
def get_score(self):
return self.__score
def set_score(self, score):
if 0 <= score <= 100:
self.__score = score
else:
raise ValueError('score must be between 0 and 100.')
xm = Student('xm', 70)
xm.set_score(199)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[421], line 1 ----> 1 xm.set_score(199) Cell In[419], line 27, in Student.set_score(self, score) 25 self.__score = score 26 else: ---> 27 raise ValueError('score must be between 0 and 100.') ValueError: score must be between 0 and 100.
双下划线的变量实际上由python解释器改了名字,改为了_class__attribute
的格式(不同版本可能名称不一样)
xm._Student__name
'xm'
xm._Student__name = 'haha' # 实际上可以改
xm.get_name()
'haha'
xm.__name = 'asdadsahds'
del xm._Student__name # 甚至删除
xm._Student__name
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[427], line 1 ----> 1 xm._Student__name AttributeError: 'Student' object has no attribute '_Student__name'
但最好不要这样做!
类似的,如果这样写:
xm.__name = 'heyhey'
实际上只是给xm增加了一个 attribute,并不是内部的__name
xm.get_name()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[429], line 1 ----> 1 xm.get_name() Cell In[419], line 18, in Student.get_name(self) 17 def get_name(self): ---> 18 return self.__name AttributeError: 'Student' object has no attribute '_Student__name'
定义class时可以从现成的class中继承
class Animal(object):
def __init__(self, legs=4, fur=True):
self.legs = legs
self.fur = fur
def run(self):
print('Animal is running...')
class Dog(Animal):
pass
class Cat(Animal):
pass
dog = Dog()
cat = Cat()
dog.run()
Animal is running...
dog.legs
4
cat.fur
True
给子类增加一些方法:
class Dog(Animal):
def eat(self):
print('Eating meat...')
dog = Dog()
dog.eat()
Eating meat...
子类和父类有同样的方法时,子类的方法会覆盖父类的
class Dog(Animal):
def eat(self):
print('Eating meat...')
def run(self):
print('dog is running...')
class Cat(Animal):
def run(self):
print('cat is running...')
dog = Dog()
dog.run()
dog is running...
cat = Cat()
cat.run()
cat is running...
python中的数据类型都是一种对象。例如,str, list, tuple, 等等,都有自己对应的attribute和method。Animal也一样
a = list()
b = Animal()
c = Dog()
isinstance(a, list)
True
isinstance(b, Animal)
True
isinstance(c, Dog)
True
isinstance(c, Animal)
True
c既是Dog, 也是 Animal
isinstance(b, Dog)
False
但b不是Dog
多态的好处:
def run_twice(animal):
animal.run()
animal.run()
run_twice(Animal())
Animal is running... Animal is running...
run_twice(Dog())
dog is running... dog is running...
run_twice(Cat())
cat is running... cat is running...
a = [1,2,3]
a.__len__()
3
# 定义一个新的子类
class Tortoise(Animal):
def run(self):
print('Tortoise is running slowly...')
run_twice(Tortoise())
Tortoise is running slowly... Tortoise is running slowly...
因此,完全可以写一个新的instance,只要保证子类的run
的写法正确,原来的代码是怎么去调用父类方法,我们完全不管:
python 另一个很方便的地方是 duck typing
也就是说,完全可以自己定义一个类,这个类不从Animal继承而来,而只要有Animal的方法或者性质,那么在调用时,这个类就可以看作是Animal
class DuckTyping(object):
def run(self):
print('This is a duck typing')
run_twice(DuckTyping())
This is a duck typing This is a duck typing
这是python这种“动态语言”的特点,而其他一些语言对此是严格要求的。
print(type(10))
<class 'int'>
print(type('str'))
<class 'str'>
print(type(Animal()))
<class '__main__.Animal'>
print(type(Dog()))
<class '__main__.Dog'>
print(type(abs))
<class 'builtin_function_or_method'>
import types
type(run_twice) == types.FunctionType
True
type(abs) == types.BuiltinFunctionType
True
type(abs) == types.FunctionType
False
type(lambda x:x) == types.LambdaType
True
type((x for x in range(10))) == types.GeneratorType
True
isinstance(c, Animal)
True
isinstance(c, Dog)
True
isinstance(c, (Dog, str))
True
isinstance(123, int)
True
dir(types)
['AsyncGeneratorType', 'BuiltinFunctionType', 'BuiltinMethodType', 'CellType', 'ClassMethodDescriptorType', 'CodeType', 'CoroutineType', 'DynamicClassAttribute', 'FrameType', 'FunctionType', 'GeneratorType', 'GenericAlias', 'GetSetDescriptorType', 'LambdaType', 'MappingProxyType', 'MemberDescriptorType', 'MethodDescriptorType', 'MethodType', 'MethodWrapperType', 'ModuleType', 'SimpleNamespace', 'TracebackType', 'WrapperDescriptorType', '_GeneratorWrapper', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_calculate_meta', '_cell_factory', 'coroutine', 'new_class', 'prepare_class', 'resolve_bases']
dir(Animal)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'run']
dir('sasdfas')
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
'a' + 'b'
'ab'
'a'.__add__('b')
'ab'
import pandas as pd
python的很多函数可通用于不同的类型的数据,比如len
,返回长度
len
作用于一个对象时,自动取用这个对象的__len__
方法len('abc')
3
df = pd.DataFrame([[1,2,3,4,5],[5,4,3,2,1]])
df
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 |
1 | 5 | 4 | 3 | 2 | 1 |
len(df)
2
class MyHair(object):
def __len__(self):
return 100
len(MyHair())
100
getattr()
, setattr()
, hasattr()
dog = Dog()
hasattr(dog, 'legs')
True
hasattr(dog, 'tail')
False
setattr(dog, 'fur', 'yes')
dog.fur
'yes'
getattr(dog, 'fur')
'yes'
getattr(dog, 'tail', 'No tail')
'No tail'
hasattr(dog, 'run')
True
foo = getattr(dog, 'run')
foo
<bound method Dog.run of <__main__.Dog object at 0x7fe3bb094610>>
foo() # 等价于 dog.run()
dog is running...
高级语言一般都内置了一套错误处理方法,来帮助我们编程
try:
print('try...')
r = 10 / 0
print('result:', r)
except ZeroDivisionError as e:
print('except:', e)
finally:
print('finally...')
print('END')
try... except: division by zero finally... END
try:
print('try...')
r = 10 / 3
print('result:', r)
except ZeroDivisionError as e:
print('except:', e)
finally:
print('finally...')
print('END')
try... result: 3.3333333333333335 finally... END
finally
不管有没有错,最终都会运行
错误的种类有很多,都继承自 BaseException
。捕获错误时要注意,捕获时会把子类也一起算上:
import math
def my_sqrt(x):
print(math.sqrt(x))
try:
my_sqrt(-2)
except ValueError as e:
print('ValueError')
except UnicodeError as e:
print('UnicodeError')
ValueError
UnicodeError
是ValueError
的子类,因此不能被捕获
如果没有出错的情况下要继续运行一些其他的功能,可以加else
try:
print('try...')
r = 10 / int('2')
print('result:', r)
except ValueError as e:
print('ValueError:', e)
except ZeroDivisionError as e:
print('ZeroDivisionError:', e)
else:
print('no error!')
finally:
print('finally...')
print('END')
try... result: 5.0 no error! finally... END
错误信息栈
def foo(s):
return 10 / int(s)
def bar(s):
return foo(s) * 2
def main():
bar('0')
main()
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) Cell In[496], line 10 7 def main(): 8 bar('0') ---> 10 main() Cell In[496], line 8, in main() 7 def main(): ----> 8 bar('0') Cell In[496], line 5, in bar(s) 4 def bar(s): ----> 5 return foo(s) * 2 Cell In[496], line 2, in foo(s) 1 def foo(s): ----> 2 return 10 / int(s) ZeroDivisionError: division by zero