Differences

This shows you the differences between two versions of the page.

--- software_development:python_pandas [2022/08/04 05:46] – [Pivot_table] prgram
+++ software_development:python_pandas [2023/05/16 06:09] – [encoding_errors - 'ignore'] prgram
@@ Line 1: / Line 1: @@
 ====== python pandas ======
 {{INLINETOC}}
+=== etc : list ===
+<code python>
+set( [list] ) # unique value
+[list].sort() #자동적용?
+[list1] + [list2]  #list합치기
+</code>
 ===== shape of df =====
@@ Line 20: / Line 27: @@
 <code python>
 df.groupby([컬럼들]).agg({'컬럼':sum}).reset_index()
+df.groupby([COLUMNS])['COLUMN'].max().reset_index()
 df = df.assign(date=pd.to_numeric(df['date'], errors='coerce')).groupby(['코드', '종목명']).agg({'date':np.min}).reset_index().drop_duplicates()
@@ Line 53: / Line 62: @@
 df.columns = ['1'] + df.columns[1:].tolist()
 </code>
+=== order of columns ===
+<code python>
+#1
+df = df.sort_index(axis='columns', level = 'MULTILEVEL INDEX NAME/no')
+#2
+df.columns
+col_order = ['a','b','c']
+df = df.reindex(col_order, axis='columns')
+</code>
 === map ===
@@ Line 87: / Line 107: @@
 iloc: Select by position
 loc: Select by label
+df.loc[:,~df.columns.isin(['a','b'])]
+df[~( df['a'].isin(['1','2','3']) & df['b']=='3' )]		#row-wise
+df.loc[~( df['a'].isin(['1','2','3']) & df['b']=='3' ), 8]	#row-wise & column
 </code>
@@ Line 98: / Line 123: @@
 =====I/O file=====
+=== encoding_errors - 'ignore'===
+Encoding 제대로 했는데도 안되면..
+공공데이터가 이런 경우가 많음.
+<code python>
+import chardet
+with open(file, 'rb') as rawdata:
+    result = chardet.detect(rawdata.read(100000))
+result
+data = pd.read_csv( file, encoding='cp949', encoding_errors='ignore')
+# on_bad_lines='skip'
+# error_bad_lines=False
+</code>
 === to_numberic ===