section 1 복습

matplotlib 학습 링크 : https://wikidocs.net/92085
read_csv() 자주 사용하는 parameter 정리
pandas 날짜/시간 관련 : https://mindscale.kr/course/pandas-basic/datetime/
pandas 기본 기능(cheat sheet)
- dataframe.select_dtypes(include=’ ’ or exclude = ‘[여러개도 가능] ‘)
- dataframe.loc[ : : -1] or dataframe.loc[ : , : : -1]
- df = df.apply(pd.to_numeric, errors='coerce').fillna(0) : 데이터 프레임 전체 데이터 형태 변경, 알 수 없는 문자는 NaN으로 변경 후 NaN값 0으로 채우기
- stock_files = sorted(glob('data/stocks*.csv')) : glob 함수로 비슷한 파일명 리스트 형태로 묶어주기
- pd.concat((pd.read_csv(file) **for** file **in** stock_files), ignore_index=**True**) : glob으로 묶은 여러 파일들 concat함수로 행 방향 붙여주기(열방향은 axis=1), 인덱스는 무시하기
- dataframe[~조건] : 조건이 아닌 것들 모두 출력
- dataframe.column.value_counts() : 해당 열의 데이터 개수 출력
- counts.nlargest(3).index : .index가 없는 것과 무슨 차이???
- dataframe.isna().mean() : 데이터 프레임의 결측치 비율 확인
- ufo.dropna(thresh=len(dataframe)*0.9, axis='columns').head() : 10%이상 결측열 제거
- df[['first', 'middle', 'last']] = df.name.str.split(' ', expand=**True**) : name 열의 데이터를 공백 기준으로 분할하여 데이터 프레임으로 확장, 확장된 열이름은 first,middle,last
- dataframe.groupby('column1').column2.agg(['sum', 'count']).head() : 집계함수
- 다중인덱스 시리즈 → 데이터프레임 : unstack()
- pd.cut(titanic.Age, bins=[0, 18, 25, 99], labels=['child', 'young adult', 'adult']) : continuous 데이터 → categorical 데이터 변경할 때 구간별로 나누어 labeling : pd.cut()
- pd.set_option('display.float_format', '**{:.2f}**'.format) : 소수점 2자리까지 보여주기
- format_dict = {'Date':'{:%m/**%d**/%y}', 'Close':'$**{:.2f}**', 'Volume':'**{:,}**'} stocks.style.format(format_dict)
  
  → 데이터 format 설정
데이터 형태
- 이산형
  - Nominal : 빈도,비율,퍼센트 / bar,pie차트
  - Ordinal : 빈도,비율,퍼센트,percentiles,mode,median,interquartile range / bar, pie
- 연속형 : percentiles, median, interquartile range, mean, median,mode, Standard deviation, range, IQR / Histogram,Boxplot
  - Interval
  - Ratio
Section1 정리