Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
software_development:python_pandas [2022/08/04 14:46]
prgram [Pivot_table]
software_development:python_pandas [2023/05/16 15:09]
prgram [encoding_errors - 'ignore']
Line 1: Line 1:
 ====== python pandas ====== ====== python pandas ======
 {{INLINETOC}} {{INLINETOC}}
 +
 +=== etc : list ===
 +<code python>
 +set( [list] ) # unique value
 +[list].sort() #​자동적용?​
 +[list1] + [list2] ​ #​list합치기
 +</​code>​
  
 ===== shape of df ===== ===== shape of df =====
Line 20: Line 27:
 <code python> <code python>
 df.groupby([컬럼들]).agg({'​컬럼':​sum}).reset_index() df.groupby([컬럼들]).agg({'​컬럼':​sum}).reset_index()
 +
 +df.groupby([COLUMNS])['​COLUMN'​].max().reset_index()
  
 df = df.assign(date=pd.to_numeric(df['​date'​],​ errors='​coerce'​)).groupby(['​코드',​ '​종목명'​]).agg({'​date':​np.min}).reset_index().drop_duplicates() df = df.assign(date=pd.to_numeric(df['​date'​],​ errors='​coerce'​)).groupby(['​코드',​ '​종목명'​]).agg({'​date':​np.min}).reset_index().drop_duplicates()
Line 53: Line 62:
 df.columns = ['​1'​] + df.columns[1:​].tolist() df.columns = ['​1'​] + df.columns[1:​].tolist()
 </​code>​ </​code>​
 +
 +=== order of columns ===
 +<code python>
 +#1
 +df = df.sort_index(axis='​columns',​ level = '​MULTILEVEL INDEX NAME/​no'​)
 +#2
 +df.columns
 +col_order = ['​a','​b','​c'​]
 +df = df.reindex(col_order,​ axis='​columns'​)
 +</​code>​
 +
  
 === map === === map ===
Line 87: Line 107:
 iloc: Select by position iloc: Select by position
 loc: Select by label loc: Select by label
 +  ​
 +df.loc[:,​~df.columns.isin(['​a','​b'​])]  ​
 +
 +df[~( df['​a'​].isin(['​1','​2','​3'​]) & df['​b'​]=='​3'​ )] #​row-wise
 +df.loc[~( df['​a'​].isin(['​1','​2','​3'​]) & df['​b'​]=='​3'​ ), 8] #​row-wise & column
 </​code>​ </​code>​
  
Line 98: Line 123:
   ​   ​
 =====I/O file===== =====I/O file=====
 +
 +=== encoding_errors - '​ignore'​===
 +Encoding 제대로 했는데도 안되면..
 +공공데이터가 이런 경우가 많음.
 +
 +<code python>
 +import chardet
 +with open(file, '​rb'​) as rawdata:
 +    result = chardet.detect(rawdata.read(100000))
 +result
 +
 +
 +data = pd.read_csv( file, encoding='​cp949',​ encoding_errors='​ignore'​)
 +# on_bad_lines='​skip'​
 +# error_bad_lines=False
 +</​code>​
  
 === to_numberic === === to_numberic ===